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ABSTRACT 


The  consistency  of  data  is  threatened  by  (i)  inadequate 
concurrency  controls,  (ii)  system  crashes,  (iii)  security 
breaches,  and  (iv)  erroneous  software.  With  the  growing  use 
of  terminal-oriented  computer  systems  and  with  the  trend 
towards  the  distribution  of  system  functions  over  a  computer 
network,  preservation  of  consistency  of  the  database  is  one 
of  the  most  critical  problems  faced  by  the  designers  of  the 
distributed  information  networks.  In  this  thesis,  a 
detailed  understanding  of  the  problems  caused  by  inadequate 
concurrency  controls  and  system  crashes  in  a  network 
environment  is  considered. 

A  new  approach  to  deadlock  detection  is  proposed.  The 
concept  of  "on-line"  deadlock  detection  in  distributed 
information  networks  is  introduced.  It  is  defined  to  be  the 
process  of  recognizing  deadlock  occurrence  as  soon  as  it 
happens,  at  the  installation  which  makes  the  resource 
allocation  decision,  without  the  necessity  of  further 
communication  for  every  request  made  or  granted.  An  on-line 
detection  algorithm  is  suggested  and  developed.  All  of  the 
earlier  algorithms  restrict  a  process  to  having  at  most  one 
outstanding  request.  In  our  approach,  such  a  restriction  is 
removed  in  view  of  the  fact  that  in  real-world  applications 
more  than  one  outstanding  request  is  a  certainty.  This 
leads  to  a  situation  in  which  an  allocation  decision  on  a 
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data  resource  (with  multiple  waiting-access  requests) 
released  by  a  completing  process  would  lead  to  a  deadlock. 
For  this  case,  the  results  and  the  approach  suggested  are 
new  and  original.  An  elegant  solution  which  combines  the 
principles  of  detection  and  avoidance,  first  of  its  kind  as 
a  mixed  method  in  database  systems,  is  shown  to  detect  and 
avoid  a  potential  deadlock. 

Another  aspect  of  the  problem  considered  is  the 
reliable  operation  of  database  systems,  partitioned  and/or 
replicated  over  a  network  of  computers.  The  design  of  a 
method  which  maintains  database  consistency  during  system 
update  and  recovery  is  guided  by  the  goals  of  s impl ic i ty , 
tolerable  overhead ,  partial  operabil i ty ,  and  avoidance  o f 
global  rollback .  In  this  new  approach,  retrieval  and  update 
transactions  are  subclassified  and  recovery  protocols 
defined  which  take  advantage  of  the  known  properties  of  each 
transaction  class.  An  optimal  policy  for  checkpointing  in  a 
particular  recovery  protocol  is  derived  using  a  new  simple 
model.  The  policy  determines  the  checkpoint  dynamically  as 
the  transactions  are  processed,  and  is  different  from 
earlier  fixed  interval  methods.  The  feasibility  of  its 
implementation  makes  the  scheme  new  and  practical.  The 
cascading  effect  of  a  global  rollback  is  modeled  by  using 
the  progress  of  processes  represented  by  a  set  of 
interaction  tuples  and  recovery  points  ordered  in  the  time 
domain.  A  backup  algorithm  based  on  this  model  is 
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developed.  Recovery  aspects  for  a  wide  set  of  system 
failures  are  considered  and  several  partial  solutions 
outl ined . 
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CHAPTER  1 


INTRODUCTION 


1.1  Distributed  Systems 

The  steady  "affair"  of  the  past  decade  between 
computers  and  communication  technology  has  matured  into  a 
"marriage"  bringing  with  it  the  creation  of  large  computer 
communication  complexes.  Consequently,  we  have  witnessed  in 
recent  years  the  rapid  evolution  of  computer  communication 
networks  from  research  efforts  to  operational  utilities. 
There  is  hardly  any  doubt  that  networks  of  computers  will 
play  an  increasingly  important  role  in  providing  enhanced 
computing  services  to  users  in  universities,  business  firms, 
and  government  agencies.  For  instance,  the  ARPANET 
[McQuillan  and  Walden,  1977],  currently  supports 
communication  among  more  than  one  hundred  computer  systems. 
The  network  is  under  continual  development  and  is  used  by 
thousands  of  users  daily.  Computer  networks  have  the 
capability  to  bring  computing  power  to  the  people  who  need 
it,  and  provide  access  to  a  wider  variety  of  resources 
dispersed  among  several  computers  linked  by  a  communications 
facility  to  provide  the  basis  for  a  "distributed"  computing 
service.  Potentially,  this  technology  v/ill  revolutionize 
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the  way  data  processing  is  done.  Bright  futures  have  been 
predicted  for  commercial  networks  such  as  TELENET  [Hovey, 
1974]  in  the  United  States  and  DATAPAC  [Clipsham  et  al., 
1976]  in  Canada. 

A  system  is  said  to  be  distributed  if  hardware  or 
processing  logic,  data,  the  processing  actions  or  the 
operating  system  components  are  dispersed  on  multiple 
computers  which  are  logically  and  physically  interconnected. 
In  such  a  system,  data  may  be  replicated  at  several  sites  or 
on  separate  storage  devices;  the  processing  logic  cooperates 
and  interacts  through  a  communication  network  under 
decentralized  system-wide  control.  A  taxonomy  of 
distributed  processing  systems  design,  applications,  models, 
effects,  experiences  and  techniques  has  been  provided 
elsewhere  [Chang,  1976;  Le  Lann,  1977;  Maryanski  and 
Kreimer,  1976;  Eckhouse  e_t  a_l .  ,  1978;  van  Dam  and  Stankovic, 
1978;  Marsland  and  Sutphen,  1978].  The  essential  properties 
and  character ist ics  of  a  distributed  system  are  summarized 
below. 

*  A  wide  variety  of  general-purpose  resources,  both  physical 

and  logical  in  nature,  is  available.  These  resources 
can  be  dynamically  assigned  to  specific  transactions. 
However,  some  special-purpose  dedicated  resources  may 
not  be  reass ignabl e . 

*  The  notion  of  autonomous  operation  of  the  process  is 

supported  by  the  distribution  of  physical  and  logical 
resources  of  the  system,  interacting  with  processes 
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through  a  communication  complex.  The  transfer  of 
messages  follows  the  principle  of  a  two-party  protocol . 
In  this  protocol,  the  cooperation  of  the  two  parties  is 
essential  to  complete  a  transfer  successfully. 

*  The  various  processors  may  have  non- homogeneous  operating 

systems.  A  high-level  operating  system,  in  the  form  of 
a  collection  of  well-defined  protocols  and  software, 
governs  the  integrated  functioning  of  the  network. 
However,  the  autonomous  operation  of  each  computer 
requires  the  absence  of  a  strong  hierarchy  between  the 
network  operating  system  and  the  local  operating 
systems . 

*  The  interface  between  the  user  and  the  distributed  system 

provides  transparency  from  the  system  organization. 
Effectively,  the  user  can  handle  resources  as  if  he  were 
communicating  with  a  single,  centralized  system.  A  high 
level  interface  provides  data  independence  in  systems, 
and  hides  system  status  from  the  user.  However, 
provision  can  be  made  for  a  knowledgeable  user  to 
request  services  by  the  designation  of  the  server. 

*  Cooperative  autonomy  rather  than  independent  behaviour  is 

the  manner  in  which  the  system  functions.  All  the 
processors  follow  the  general  guidelines  outlined  in  the 
network  operating  system  to  facilitate  this  cooperation. 
Autonomous  functioning  at  both  logical  and  physical 
levels  is  offered  by  this  essential  component  of  the 
system . 
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1.2  Distributed  Databases 

An  estimated  20%  of  the  United  States  G.N.P.  is  devoted 
to  the  collection,  processing  and  dissemination  of 
information1-,  leading  to  data  management  occupying  an 
important  segment  of  the  United  States  economy. 

Surprisingly,  only  a  small  portion  of  this  information  is 
computerized.  With  such  tremendous  potential  requirements 
of  automated  data  management,  along  with  the  increases  in 
database  size,  complexity  and  diversity  of  use,  and  users' 
strong  preference  for  interactive  computer  systems,  the 
necessity  for  additional  computing  resources  grows  rapidly. 
Replacement  by  higher  performance  components  is  an  expensive 
way  of  growing.  Distributing  several  system  functions  and 
data  over  a  network  of  computers  has  been  projected  as  an 
economic  panacea  to  the  expansion  problem  which  will  provide 
improved  performance,  as  well  as  enhanced  accessibility  of 
data  TBooth,  1972,  1978;  Comba,  1975;  Enslow,  1978].  The 
evolution  of  modern  computer  network  technology  and  the  rise 
of  common  carrier  packet  switched  networks  have  provided 
motivation  to  the  development  of  distributed  information 
networks.  The  increased  reliability  that  can  be  achieved 
with  distributed  databases  and  the  availability  of 
inexpensive  mini-  and  micro-computers  due  to  falling 
hardware  costs,  have  contributed  to  the  widespread  interest 
in  such  systems.  Distributed  databases  will  clearly  meet 

1:  Frost  and  Sullivan,  Inc.,  Markets  fo r  Data  Base  Services , 
New  York,  July,  1973,  pp.  11. 
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Figure  1 A 


Distributed  Information  Network 
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the  future  challenges  of  effective  information  management. 
Complete  reviews  of  distributed  database  management  have 
appeared  in  several  articles  TDeppe  and  Fry,  1976; 

Lowenthal,  1977;  Rothnie  and  Goodman,  1977;  Peebles  and 
Manning,  1978;  Maryanski,  1978;  CODASYL,  1978;  Davenport, 
1978;  Adiba  ejt  a^.  ,  1978;  Rothnie  e_t  a_l.  ,  19791  . 

Typically,  a  distributed  database  management  system 
consists  of  two  or  more  computers  interconnected  by  a 
communication  network.  Each  computer  has  a  database 
attached  to  its  auxiliary  storage.  An  example  of  a 
distributed  database  is  schematically  depicted  in  Figure 
1.1.  Each  processor  is  either  a  general-purpose  system,  or 
a  backend  rcanaday  e_t  a_l . ,  19741  ,  or  a  database  machine 
TBaum  and  Hsiao,  1976;  Banerjee,  Baum  and  Hsiao,  1978; 

Hsiao,  19791.  Front-end  communication  processors  are 
interconnected  by  communication  links,  such  as  coaxial 
cables,  wire  pairs,  or  microwave  channels.  These  front-ends 
provide  the  interface  through  which  host  computers  are 
connected  to  the  communication  facility,  and  also  cooperate 
to  support  communication  between  hosts.  Each  host  consists 
of  a  database  management  system  which  supports  one  or  more 
interactive  users. 

1.3  Centralized  versus  Distributed  Approach 

Terminal-based  computer  communications  systems,  in  the 
past,  have  generally  focussed  around  a  single,  large 
computer  installation.  Although  a  fair  argument  can  still 
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be  made  for  servicing  remote  users  with  a  centralized 
system,  the  major  deficiencies  of  such  a  system  have 
contributed  to  widespread  belief  in  the  distributed 
approach.  With  a  centralized  system  data  communication 
costs  are  significant.  The  steady  cost  of  transmitting  data 
in  contrast  to  the  drastic  fall  in  hardware  costs  further 
motivates  decentral i za tion  of  data  management.  Despite 
economic  benefits  of  centralized  systems  in  areas  such  as 
operations,  managerial  personnel  have  become  aware  of 
general  undesirable  side  effects  of  such  a  system.  The 
increasing  demand  of  a  growing  community  of  remote  users  for 
interactive  systems  and  the  consequent  need  to  deal  with 
more  concurrent  events  leads  to  enhanced  complexity  in 
servicing  with  a  centralized  system.  Moreover,  the  failure 
of  the  central  node  brings  an  organization  to  a  standstill. 
The  requirement  to  endorse  and  enforce  standardized, 
centralized  data  processing  project  development  often  works 
contrary  to  the  management  philosophy  and  needs  of 
hierarchically  decentralized  organizations. 

Besides  overcoming  the  above  perceived  shortcomings, 
distributed  databases  offer  several  major  advantages.  These 
are  generally  regarded  as  including: 

(a)  Reliability:  With  data  redundantly  stored  on  multiple 
computers,  the  system  is  not  susceptible  to  total 
failure  when  a  single  computer  component  breaks  down. 

(b)  Responsiveness:  The  close  proximity  of  data  enhances 
accessibility  of  resources  and  improves  system 
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performance.  A  higher  throughput  is  obtained  because  of 
parallel  processing. 

(c)  Shareabil i ty;  The  ability  to  share  data  among  several 
geographically  separated  installations  and  to  gain 
access  to  specialized  resources  that  would  otherwise  not 
be  available  or  would  be  available  at  an  unacceptable 
cost  is  enhanced. 

(d)  Expansion:  The  system  lends  itself  to  incremental  upward 
scaling.  The  system  can  be  developed  on  an  incremental 
basis  with  only  as  much  computing  power  being  installed 
as  is  required  at  that  time.  Distributed  systems  have 
the  capability  to  absorb  new  technology  as  it  is 
invented  and  will  not  necessitate  that  in-place  systems 
be  returned  to  the  vendor  or  scrapped. 

(e)  Human  factors :  Individual  groups  can  physically  possess 
part  of  the  corporate  database  (which  holds  their  part 
of  the  data)  giving  themselves  the  responsibility  and 
the  satisfaction  of  updating  their  own  database. 

Another  favourable  human  factor  is  the  reduction  of  the 
vulnerability  of  the  database  to  strike  action  or  acts 
of  terrorism. 

1.4  Maintenance  of  Semantic  Correctness  of  Data 

A  database  is  regarded  as  a  model  of  some  limited 
universe  rather  than  just  a  collection  of  values.  At  every 
instant  of  time,  some  configuration  of  that  application 
world  is  reflected  by  the  contents  of  the  database.  Whether 
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any  given  configuration  of  the  database  is  reasonable  or  not 
is  specified  by  a  set  of  rules.  Ensuring  the  validity  of 
the  set  of  constraints  means  to  guarantee  the  integrity  of 
data,  or  rather  more  precisely,  data  accuracy,  consistency 
and  timeliness. 

When  the  database  contents  comply  with  constraints 
derived  from  the  knowledge  about  the  meaning  of  the  data,  it 
is  said  that  the  semantic  correctness  of  the  data  is 
preserved.  Semantic  correctness  can  be  enforced  by  allowing 
on  the  database  only  a  set  of  precisely  specified  meaningful 
operations;  by  adopting  a  set  of  programming  and  interaction 
conventions;  by  dynamically  checking  the  results  of  updates; 
or  by  proving  for  each  update  process  that  the  integrity 
constraints  are  satisfied. 

Loss  of  correctness  (inconsistency)  can  be  seen  in 
several  forms.  In  its  simplest  form,  an  individual  value  of 
a  particular  field  can  be  inappropriate.  For  instance,  a 
salary  which  is  non-numeric  or  a  birth  year  which  places  the 
person's  age  at  250  years.  Secondly,  an  inconsistency 
between  different  fields  of  the  same  record,  such  as  an 
individual's  salary  not  reflecting  his  professional  status. 
Thirdly,  an  inconsistency  may  occur  between  the  field(s)  of 
one  record  and  field(s)  of  related  record(s).  For  example, 
in  an  employee  database,  a  rule  that  managers  draw  more  than 
their  employees  may  be  violated.  Fourthly,  certain  global 
patterns  may  be  out  of  order  in  some  set  of  records.  Such 
an  inconsistency  is  not  due  to  the  individual  records,  but 


hi  ■ 


10 


due  to  a  collection  of  them.  The  global  patterns  can 
typically  be  aggregate  functions,  for  instance,  the  average 
salary  of  all  employees  is  less  than  $20000,  or  every 
department  has  exactly  one  manager.  Normally  these 
violations  are  not  due  to  a  faulty  value,  but  due  to 
noncompliance  with  expected  patterns.  Lastly,  blank  fields, 
obsolete  values  or  records  that  cannot  be  found  are 
incorporated  in  the  notion  of  missing  data. 

Users  of  shared  databases  presume  that  the  correctness 
of  the  information  upon  which  they  work  is  preserved. 
Preservation  of  such  consistency  is  one  of  the  most  critical 
problems  faced  by  the  designers  of  database  systems.  The 
semantic  accuracy  of  the  data  is  threatened  by  (i) 
inadequate  concurrency  controls;  (ii)  system  crashes;  (iii) 
security  breaches;  and  ( iv)  erroneous  software. 

Concurrency  Controls  r 

Shared  access  is  allowed  to  the  database  to  maximize 
concurrent  use  of  system  resources.  Concurrency  control  is 
a  system  mechanism  that  is  concerned  with  deciding  what 
actions  should  be  taken  in  response  to  requests  by 
individual  processes  (transactions)  to  update  the  database. 
The  concurrency  control  should  be  capable  of  effectively 
handling  conflicting  update  requests,  deadlocks  or  similar 
occurrences,  and  maintain  consistency  of  the  database.  In  a 
distributed  database  the  update  mechanism  must  guarantee 
that  updates  to  database  copies  preserve  the  mutual 
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consistency  of  multiple  copies  of  the  replicated  data  as 
well  as  maintain  the  internal,  cons i s t ency  of  each  database 
copy.  Mutual  consistency  requires  that  all  copies  of  the 
replicated  data  be  identical,  in  the  sense  that  they  must 
converge  to  the  same  final  state  if  all  user  activity  were 
to  cease.  Internal  consistency  requires  the  maintenance  of 
semantic  accuracy  of  data,  just  as  in  a  non- redundant 
database . 

Consider  the  data  entities  x,  y  and  z  duplicated  at 
installations  and  N  .  Let  the  initial  values  of  all 
these  entities  at  both  installations  be  1.  Further,  let  us 
assume  that  internal  consistency  constraint  is  that  "the  sum 
of  the  values  of  x,  y,  and  z  is  3".  Consider  the  two 
updates  and  U?. 

U,  :  x  4 —  0  ,  y  4 —  2 ; 

U^:  Y  < —  0,  z  4 —  2; 

If  and  \]  are  both  applied,  the  internal  consistency  of 
the  database  will  be  destroyed,  regardless  of  their  order  of 
application.  Proper  operation  dictates  the  rejection  of  one 
of  the  updates.  The  mutual  consistency  of  the  data  would  be 
destroyed  if  updates  U-^  and  U?  are  accepted  at  and 
respectively,  even  though  the  internal  consistency  of  each 
copy  is  preserved.  The  maintenance  of  consistency  requires 
some  sort  of  locking  scheme,  which  in  turn  leads  to 


deadlocks . 
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System  Crashes : 

The  problem  of  reliable  operation  of  a  distributed 
database  system  in  the  presence  of  failures,  boils  down  to 
maintaining  database  consistency  during  update  transactions 
in  the  presence  of  either  system  failure  or  communication 
breakdown.  The  wide  spectrum  of  system  malfunctions  are  of 
the  form:  (a)  a  storage  failure  (head  crash);  (b)  a  system 
crash  (software  failure);  (c)  a  lost  message;  (d)  a 
duplicated  message;  (e)  a  lost  process  (due  to  system  crash 
and  subsequent  recovery);  (f)  network  partitioning;  and  (g) 
operation  with  missing  nodes. 

Suppose  for  instance,  a  database  transaction  updates 
three  records,  each  stored  at  a  different  installation.  The 
updates  are  not  in  effect  until  all  have  been  completed  and 
acknowledged.  After  receiving  the  acknowledgement  from 
individual  installations,  the  source  of  the  transaction 
communicates  the  message  to  effect  the  updates  to  the  three 
installations.  Typically,  the  problem  of  maintaining 
consistency  arises  when  one  of  the  installations  crashes 
before  receiving  the  message. 

Secur i ty  Breaches : 

Potential  security  violations  can  be  categorized  into 
the  following  three  classes  [Saltzer  and  Schroeder,  19751: 

(a)  Unautho r i zed  info rmat ion  release :  refers  to  unauthorized 
access  by  a  person  to  take  advantage  of  data  stored  in 
the  computer.  An  intruder  can  infer  correctly  by 
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observing  the  patterns  of  input  and  output  by  what  can 
be  termed  " data- traf f ic  analysis".  Tapping  of  network 
communications  may  result  in  unauthorized  exposure  of 
sensitive  information,  alteration  of  message  text, 
misrouting  or  misdelivery  of  messages,  or  spoofing  of  a 
network  resource. 

(b)  Unautho  r i zed  info  rmat ion  mod  ification:  relates  to  the 
ability  of  an  unauthorized  person  to  change  or  delete 
the  information  stored  in  the  computer.  Network 
penetrators  may  be  able  to  use  counterfeit  network 
resource . 

(c)  Unautho r i zed  denial  o f  use ;  an  intruder  can  deny  the 
authorized  user  the  privilege  to  access  or  update  due 
information  by  sabotaging  the  system.  The  forms  of 
sabotage  may  be  by  causing  the  system  to  'crash',  by 
disrupting  the  transaction  manager,  or  by  directly 
firing  a  bullet  into  the  computer. 

Unauthorized  release,  modification,  or  denial  of  use  can 
potentially  result  in  the  loss  of  information  itself,  or 
loss  of  semantic  accuracy  of  the  data. 

Erroneous  Software : 

It  is  essential  to  verify  and  to  test  that  the  software 
produced  is  indeed  the  software  intended.  Such  verification 
can  be  achieved  by  proof  of  correctness  and  penetration 
tests.  Faulty  software  may  compromise  the  consistency  of 
the  data  on  which  it  acts.  The  term  'fault  tolerant  system' 
is  normally  used  to  denote  the  system  software  which  will 
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continue  to  yield  correct  results  in  the  presence  of 
hardware  and  software  fault  conditions.  The  discussion  of 
this  area  is  limited  since  the  topic  is  beyond  the  scope  of 
this  thesis. 

1.5  Overview  and  Outline  of  the  Thesis 

In  the  early  stage  of  the  thesis,  a  complete  overview 
of  the  deadlock  problem  is  provided.  There  is  a  detailed 
discussion  of  the  essential  characteristics,  relationships 
and  models  of  the  deadlock  problem  in  operating  systems, 
database  systems,  and  distributed  databases.  The  value  of 
combined  solutions  which  incorporate  detection,  avoidance 
and  prevention  principles  is  examined.  Comments  on  the  game 
theory  approach  to  deadlock,  and  probabilistic  models  of 
deadlock  are  presented. 

Holt'sri9721  graph  theoretic  model  is  used  and  extended 
for  the  distributed  database  case  and  a  new  algorithm  to 
detect  deadlocks  using  this  model  is  proposed.  The  concept 
of  ’on-line'  detection  of  deadlocks  in  distributed  databases 
is  introduced  and  defined.  An  effective  algorithm  to  detect 
deadlocks  on-line,  which  allows  the  data  resource  allocation 
decision  to  take  place  without  further  reference  to  other 
computers  is  proposed.  A  realistic  approach  is  taken, 
allowing  processes  to  have  more  than  one  outstanding 
resource  request  simultaneously.  This  approach  combines  the 
principles  of  detection  and  avoidance  as  well. 
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For  recovery  from  system  crashes,  the  knowledge  of  the 
nature  of  all  the  processes  that  access  the  data  is 
effectively  utilized  to  design  system  recovery  protocols. 
These  protocols  reflect  the  importance  of  the  availability 
of  the  on-line  system  at  all  times,  and  advocate  that  the 
recovery  overhead  should  be  directly  dependent  on  the  value 
or  sensitivity  of  data  accessed.  An  optimal  policy  for 
checkpointing  dynamically  using  a  new  simple  model  is 
derived.  The  "domino  effect"  (cascading  effect  of  global 
rollback)  is  modeled,  and  a  rollback  algorithm  is  developed. 

Finally,  the  overall  significance  of  all  the  results 
and  an  overview  of  the  motivation  for  the  research  is 
outlined.  Outstanding  problems  and  areas  that  require 
further  detailed  solutions  are  indicated. 


' 


' 


CHAPTER  2 


DEADLOCKS  IN  OPERATING,  DATABASE,  AND  DISTRIBUTED  SYSTEMS 

2.1  The  Deadlock  Problem 

Modern  multiprogramming  systems  are  designed  to  support 
a  high  degree  of  parallelism  by  ensuring  that  as  many  system 
components  as  possible  are  operating  concurrently.  The 
increasing  trend  by  commercial  firms  for  on-line  operations, 
especially  those  involving  integrated  databases,  and  the 
consequent  need  by  active  users  for  a  responsive  system, 
have  placed  heavy  demands  on  database-oriented  operating 
systems.  Compounding  these  difficulties  is  the  arrival  of 
distributed  processing  as  a  solution  to  incremental  system 
growth.  One  characteristic  of  such  contemporary  systems  is 
their  high  degree  of  resource  and  data  sharing.  Concurrency 
must  be  regulated  by  some  facility  which  controls  access  to 
sharable  resources.  Computerized  information  systems 
typically  use  locks  [Gray,  1974;  Gray  et  a^. ,  1975,  1975] 
for  this  purpose.  A  simple  lock  protocol  associates  a  lock 
with  each  data  object.  A  process  locks  the  object  it  uses 
and  holds  it  until  the  successful  completion  of  the 
transaction.  The  lock  has  the  effect  of  notifying  others 
that  the  object  is  busy.  Deadlocks  arise  when  members  of  a 
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process  group  are  requesting  access  to  resources  already 
held  exclusively  by  other  processes  within  the  group.  When 
no  such  member  of  the  group  is  willing  to  relinquish  control 
over  its  resources  until  after  it  has  completed  its  resource 
acquisition,  deadlock  is  inevitable,  and  can  only  be  broken 
by  the  involvement  of  some  external  agency. 

A  set  of  processes  becomes  deadlocked  as  a  result  of 
the  presence  of  certain  conditions,  which  may  be  informally 
summarized  as  the  exclusive  access  and  the  circular  wait 
conditions.  The  simplest  illustration  of  these  involves 
only  two  processes,  each  holding,  for  exclusive  access,  a 
different  resource  and  each  requesting  access  to  the 
resource  held  by  the  other.  The  result  is  a  circular  wait 
which  cannot  be  broken  until  one  of  the  processes  releases 
the  resource  it  holds,  or  cancels  the  request  it  made. 

In  a  more  general  case,  a  circular  wait  state  involving 
a  set  of  processes  is  said  to  exist  when  these  processes  are 
linked  in  a  circular  chain  in  such  a  way  that  every  process 
holds  at  least  one  resource  and  is  waiting  for  at  least  one 
more  resource  held  by  the  next  process  in  the  chain.  A 
circular  wait  condition  may  arise  once  the  following 
necessary  conditions  hold: 

*  each  process  requests  exclusive  control  of  one  or  more 

resources  (like  printers,  tape  drives,  data  records  for 
updating ,  etc.) ; 

*  the  processes  hold  resources  allocated  to  them,  while 

seeking  additional  ones  (data  resources  should  be  held 
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until  process  completion  for  consistency  reasons 
[Eswaran  e_t  aj^. ,  19761);  and 

*  the  preemption  of  resources  from  processes  cannot  be  done 
without  aborting  the  processes  (preemption  is  the 
reclaiming  of  a  resource  by  the  system  and  requires  the 
support  of  a  rollback  and  recovery  mechanism,  especially 
when  data  resources  are  involved). 

2.2  Examples  of  Deadlocks 

The  deadlock  problem  occurs  in  many  different  contexts. 
Analogies  can  be  made  to  real  life  situations,  provided  one 
interprets  the  processes  and  resources  involved  appropriate¬ 
ly.  For  instance,  one  often  hears  about  the  "deadlocked" 
peace  talks  between  two  countries  which  were  at  war.  In 
that  context,  the  peace-negotiating  parties  of  the  two 
countries  are  the  processes,  whereas  the  occupied 
territories  are  possibly  the  resources  over  which  exclusive 
control  is  sought.  The  peace  talks  could  be  deadlocked  if 
both  parties  refuse  to  give  up  any  occupied  lands  until 
after  the  return  of  some  land  which  their  adversary  holds. 
Several  other  examples  of  deadlocks  arise  in  day-to-day 
operations  in  the  real  world. 

Another  common  example  is  that  of  the  traffic  deadlock. 
At  uncontrolled  intersections,  or  at  intersections  marked 
with  4-way  stop  signs,  traffic  regulations  require  that, 
when  two  or  more  vehicles  approach  or  enter  the  intersection 
on  adjacent  roads  at  approximately  the  same  time,  the  driver 
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of  the  vehicle  to  the  left  shall  yield  the  right  of  way  to 
the  vehicle  on  the  right. 

Consider  the  situation  depicted  in  Figure  2.1,  where 
the  four  cars  A,  B,  C,  and  D  have  arrived  at  an  intersection 
marked  with  4-way  stop  signs  at  approx imately  the  same  time. 
According  to  the  traffic  regulation  A  should  yield  to  B,  B 
to  C,  C  to  D,  and  D  to  A.  As  far  as  A  is  concerned,  part  of 
the  intersection  belongs  to  B  for  exclusive  use,  so  A  must 
wait  until  B  passes,  and  so  on.  It  is  evident  that  a 
circular  wait  exists  as  an  essential  component  of  this 
traffic  deadlock.  Note  that  if  all  four  vehicles  move 
forward,  occupying  as  much  of  the  intersection  as  possible, 
all  traffic  comes  to  a  standstill.  Many  major  cities  avoid 
this  difficulty  by  cross  hatching  important  intersections 
requiring  that  no  vehicle  enter  that  area  unless  its  exit 
route  is  clear.  In  this  illustration,  the  cars  A,  B,  C,  D 
are  the  processes,  while  the  space  in  the  intersection  is  a 
resource  to  which  each  needs  exclusive  access.  The  cross 
hatching  technique  is  a  means  of  ensuring  that  no  process 
will  acquire  a  resource  (i.e.  occupy  the  intersection)  which 
it  cannot  subsequently  relinquish  (i.e.  leave  by  a  clear 
exit)  without  loss  of  control  or  function. 
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Figure  2.1_:  Traffic  Deadlock  at  an  intersection 
marked  with  4-way  stop  signs. 

If  the  four  quarters  of  the  intersection  are  marked  1, 
2,  3,  and  4,  the  deadlock  situation  is  different  depending 
on  whether  the  cars  turn  right,  turn  left,  or  go  straight. 
Assuming  the  exit  route  for  each  car  outside  of  the 
intersection  is  clear,  the  apparent  deadlock  can  be  handled 
easily  in  the  case  when  all  the  cars  wish  to  make  a  right 
turn.  Each  car  requires  access  to  one  quarter  of  the 
intersection  which  no  others  demand.  For  instance,  in 
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Figure  2.1,  cars  A,  B,  C,  and  D  need  access  to  quarters  1, 

2,  2,  and  4  in  the  intersection  respectively.  Actually,  in 
this  case,  there  is  no  resource  contention. 

The  state  of  affairs  is  different  and  difficult  when 
all  the  cars  either  go. straight  or  turn  left  or  a 
combination  of  the  two.  Car  A  requires  quarters  1  and  2  of 
the  intersection  to  go  straight.  Similarly  car  B  needs 
quarters  2  and  3,  and  so  on.  In  these  circumstances,  the 
deadlock  arises  due  to  each  car  requiring  exclusive  access 
to  the  quarter  of  the  intersection  to  which  the  car  on  its 
right  side  has  legal  right.  Each  car  requires  two  resources 
(considering  each  quarter  of  the  intersection  as  a  different 
resource),  one  of  which  is  held  by  the  car  on  its  right. 

The  traffic  deadlock  condition  worsens  if  all  the  cars 
desire  to  turn  left.  For  instance,  car  C  requires  the  use 
of  quarters  3,  4,  and  1  of  the  intersection.  Quarters  4  and 
1  are  legally  held  by  the  cars  D  and  A  respectively. 
Similarly  car  D  needs  the  quarters  4,  1,  and  2,  of  which 
quarters  1  and  2  are  held  by  cars  A  and  B.  In  this  traffic 
case,  the  deadlock  arises  because  of  each  car  being  blocked 
by  two  other  cars,  one  on  its  right  side  and  the  other 
straight  ahead.  Each  resource  (every  quarter  of  the 
intersection)  has  two  waiting  processes  (cars)  besides  the 
one  which  legally  holds  it  for  access. 

The  traffic  situation  can  be  resolved  through  the 
interference  of  an  external  agency  (a  policeman)  allocating 
space  to  one  of  the  vehicles.  As  a  practical  matter  the 
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traffic  deadlock  case  is  usually  resolved  by  one  driver 
aggressively  entering  the  intersection.  However,  an 
alternative  solution  might  be  to  require  vehicles  to  back  up 
a  random  amount  and  approach  the  cross  roads  again,  thereby, 
breaking  a  simultaneous  arrival.  A  similar  technique  is 
used  in  ETHERNET  [Metcalfe  and  Boggs,  19761  to  resolve 
access  conflicts  on  their  communication  medium,  and  is  usual 
on  contention-mode  telecommunication  circuits,  except  that 
"random"  means  varied  but  fixed. 

Of  greater  interest,  perhaps,  are  the  deadlocks  that 
may  occur  in  computer  operating  and  database  systems.  The 
term  "system  resource"  in  computers  broadly  refers  to 
sto rag e  med ia  (like  primary  memory,  tapes,  disks,  drums, 
etc.) ,  system  components  (such  as  tape  drives,  disk  drives, 
I/O  channels,  CPU,  readers  and  printers,  etc.),  and 
information  (for  example  communication  messages,  data 
records,  files,  directories,  programs,  system  routines, 
etc.).  Consider  a  small  multiprogramming  system  with  a 
single  card  reader  and  a  printer,  in  which  two  user  jobs 
share  use  of  the  printer  and  the  card  reader  by  means  of 
request  and  release  operations,  as  given  in  standard 
operating  systems  texts.  Due  to  independent  scheduling  of 
the  jobs  the  request  and  release  operations  can  be 
interspersed  in  several  different  orders.  Some  of  these 
sequences  lead  to  a  "deadly  embrace"  due  to  jobs  holding 
respectively  the  printer  or  the  card  reader,  and  at  the  same 
time  requesting  the  unit  held  by  the  other. 
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In  the  database  case,  consider  two  concurrently 
executing  processes  P  and  Q,  which  modify  entities  M  and  N. 


For  example: 

p  p  :  M  <==  M  +  100 
P^ :  N  <==  N  +  100 


PROCESS  P 
STEP  ACTION 


1  P0 

-i- 

1 

 U 

REQUEST  ENTITY  M 

1  Pi 

1 

1 

J 

LOCK  ENTITY  M 

P2 

1 

1- 

READ  ENTITY  M 

1  ?3 

r 

! 

WRITE  ENTITY  M 

P4 

mmr 

l 

l 

REQUEST  ENTITY  N 

p5 

1 

i- 

LOCK  ENTITY  N 

P<S 

r 

1 

i 

READ  ENTITY  N 

1  p  7 

i 

1 

,  J 

WRITE  ENTITY  N 

PR 

T 

I 

L 

UNLOCK  ENTITY  M 

|  P9 

1 

UNLOCK  ENTITY  N 

Q  Q  :  N  <==  2  *  N 
:  M  <==  2  *  M 


PROCESS  Q 
STEP  ACTION 


Sequence  of  steps  that  leads  to  deadlock: 

P0£50Pl,?lP2P3q2q3P4Cl4 

Figure  2.2:  Deadlock  in  consistent  database  systems. 

Let  us  assume  that  the  database  correctness  (consistency) 
assertion  on  the  entities  is  that  M  =  N,  and  that  the 
initial  values  of  M  and  N  are  the  same.  Interleaving  the 
actions  of  processes  P  and  Q  in  an  arbitrary  fashion,  such 
as  p^— Q^— P2~ Q2 '  lea<3s  to  the  loss  of  database  correctness. 
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Although  it  is  possible  to  construct  the  processes  so  that 
database  correctness  is  maintained,  concurrent  operation  can 
still  lead  to  deadlock.  In  order  to  retain  the  "strong 
consistency"  result  of  Eswaran  e_t  a!l.  [1975],  which  requires 
that  the  processes  be  " wel 1- fo rmed "  and  "two-phase",  the 
processes  P  and  Q  are  divided  into  disjoint  locking  and 
unlocking  phases.  A  well-formed  process  is  one  which  locks 
an  entity  before  acting  on  it  further,  and  subsequently 
unlocks  such  entities.  A  process  is  thus  required  to  be 
divided  into  growing  and  shrinking  phases.  The  first  unlock 
action  marks  the  beginning  of  the  shrinking  phase,  after 
which  a  process  cannot  issue  a  lock  request  on  any  entity  in 
the  database  until  the  release  of  all  entities  held  by  the 
process.  In  this  context,  it  is  essential  to  note  that  the 
process  P  (or  Q)  cannot  unlock  entity  M  (or  N)  before 
locking  entity  N  (or  M) ,  if  it  is  to  maintain  database 
correctness.  The  actions  of  processes  P  and  Q,  and  a 
sequence  of  steps  that  would  lead  to  a  deadlock  under 
concurrent  operation,  are  shown  in  Figure  2.2.  Besides  this 
illustration,  the  example  is  used  again  in  Section  2.5  to 
demonstrate  the  deadlock  problem  which  could  arise  in  a 
distributed  database  environment.  Several  other  aspects  of 
concurrent  operation  such  as  transaction,  lock,  log,  and 
recovery  management  have  been  dealt  with  thoroughly 
elsewhere  [Gray,  19781,  and  are  beyond  the  scope  of  this 
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2.3  Approaches  to  Handling  Deadlocks 

The  basic  strategies  for  handling  deadlocks  are 
detection,  prevention  and  avoidance  (by  heuristic  means). 

Detection  Techniques : 

When  deadlock  detection  schemes  are  used  the  requested 
resources  are  granted  where  possible.  These  techniques 
periodically  invoke  an  algorithm  which  examines  the  current 
resource  allocations  and  outstanding  requests,  in  order  to 
identify  any  processes  and  resources  involved  in  a  deadlock. 
If  a  deadlock  is  discovered,  the  system  must  recover  as 
gracefully  as  possible  by  preempting  resources  from  relevant 
processes  until  the  deadlock  is  broken. 

The  overhead  involved  in  detection  comprises  of  the 
run-time  cost  of  the  algorithm  and  the  potential  losses 
inherent  in  preempting  resources.  In  detection  schemes  no 
action  is  prompted  until  the  actual  occurrence  of  a 
deadlock.  Thus  a  resource  may  be  held  idle  by  a  blocked 
process  for  a  long  period  of  time.  If  the  resource  held  is 
a  tape  drive,  for  which  preemption  involves  possibly 
unacceptable  overhead,  it  is  difficult  to  use  detection 
principles  effectively. 

Nevertheless,  detection  techniques  have  some 
advantages,  since  the  scheme  is  invoked  intermittently.  In 
contrast,  avoidance  or  prevention  mechanisms  have  to  ensure 
that  deadlocks  never  occur  for  any  request  made,  resulting 


. 
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, 


' 

. 


2  <3 


in  undue  process  waits  and  run-time  overhead.  On-line 
handling  of  deadlocks  is  facilitated  by  detection  principles 
as  developed  in  Chapter  3.  Detection  is  used  in  conjunction 
with  a  resumption  mechanism,  such  as  a  task  swapping 
facility.  Resource  preemptions  are  minimal  since  only  the 
essential  ones  occur. 

In  the  context  of  database  systems  or  distributed 
databases,  detection  methods  rely  on  the  management  system 
to  abort,  rollback,  and  restart  at  least  one  database 
process  to  break  the  deadlock.  Here,  the  problem  of 
rollback  and  recovery  assumes  great  importance  from  the 
viewpoint  of  maintaining  consistency  of  the  database. 

Prevention  Mechanisms : 

Prevention  is  the  process  of  systematically  structuring 
the  requests  of  processes  in  a  fashion  such  that  deadlocks 
will  never  occur,  by  putting  constraints  on  the  system's 
users.  Most  proposals  for  prevention  require  that  each 
process  specify,  a  priori,  all  the  resources  needed  before 
processing  begins.  A  prevention  mechanism  differs  from  an 
avoidance  scheme  in  that  performing  run-time  testing  of 
potential  allocations  is  not  necessary.  The  deadlock  can  be 
prevented  in  several  ways,  including  requesting  all 
resources  at  once,  preempting  of  resources  held,  and 
resource  ordering. 

The  simplest  method  of  preventing  deadlocks  is  to 
outlaw  concurrency,  which  is  an  administrative  solution  to 
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the  problem.  However,  it  is  not  consistent  with  the  present 
day  philosophy  of  system  design  and  leads  to  very  poor  usage 
of  resources.  Another  method  requires  that  all  resources  be 
acquired  before  processing  starts.  Such  a  scheme  is 
inefficient  since  resources  held  may  be  idle  for  prolonged 
periods.  This  method  works  well  for  processes  which  perform 
a  single  burst  of  activity,  such  as  input/output  drivers. 

For  processes  with  fluctuating  requirements  the  method  may 
be  impractical.  In  a  database  environment,  it  may  be 
impossible  for  a  very  highly  data  driven  process  to  specify 
and  acquire  all  needed  resources  before  beginning 
processing.  In  any  case,  the  scheme  discriminates  heavily 
against  data  driven  processes. 

Certain  other  prevention  methods  require  a  blocked 
process  to  release  a  resource  held,  when  requested  by  an 
active  process.  In  such  schemes  a  process  goes  from  an 
active  to  a  blocked  state  when  its  request  for  some  resource 
cannot  be  satisfied  immediately.  A  typical  example  is  the 
usage  of  main  memory,  where  a  process  is  completely  swapped 
out  by  preempting  the  memory  it  holds,  whenever  more  than 
the  currently  available  memory  is  requested.  The  process  is 
swapped  back  only  when  the  entire  larger  quantity  of  memory 
is  available.  In  this  scheme  preemption  of  resources  is 
done  more  often  than  necessary.  Under  certain  circumstances 
in  database  systems,  this  scheme  is  subject  to  what  is 
called  "cyclic  restart"  [Stearns  et  aj^. ,  1976]  in  which  two 
or  more  processes  continually  block,  abort,  and  restart  each 
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other,  as  shown  with  a  detailed  example  in  Appendix  A. 

A  more  refined  form  of  prevention  is  by  resource 
ordering  [Habermann,  19691.  All  resources  are  uniquely 
ordered,  and  a  request  for  a  specific  resource  is  met  if  and 
only  if  all  resources  lower  in  the  ordering  that  are  needed 
in  the  future  are  also  allocated.  Blocking  of  processes  in 
a  circular  wait  is  ruled  out  by  the  ordering  rule.  The 
feasibility  of  enforcing  resource  ordering  by  compile-time 
checks  is  a  major  advantage  of  the  scheme .  Run-time 
computation  is  not  needed  as  the  deadlock  problem  is  solved 
completely  in  the  system  design.  Besides  eliminating  the 
overhead  associated  with  an  avoidance  algorithm  the 
scheduling  problem  is  simplified.  Sequences  of  requests  by 
a  process  which  can  be  allowed  are  restricted  by  the  scheme. 
This  method  relies  heavily  on  educating  system  users 
regarding  the  ordering  rule.  A  process  may  request  and  hold 
a  resource  rather  early  in  the  processing  stage,  in  which 
case  an  incremental  request  for  the  same  resource  at  a  later 
stage  is  disallowed  by  the  ordering  rule.  This  leads  to 
preempting  all  the  resources  held  by  the  process.  In  a 
database  system  environment  with  processes  of  fluctuating 
needs,  it  is  difficult  and  almost  impossible  to  order  the 
data  resources  (records,  entities,  or  fields). 

Avoidance  Schemes : 

In  avoidance  schemes,  a  request  by  a  process  for  a 
resource  is  granted  if  and  only  if  it  is  certain  that  this 
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allocation  still  leaves  at  least  one  safe  path  for  all  the 
processes  to  complete  their  execution.  Effectively,  the 
scheme  projects  detection  into  the  future  in  order  to  keep 
the  system  from  committing  itself  to  an  allocation  which 
will  eventually  lead  to  deadlock.  To  avoid  deadlock, 
therefore,  it  is  necessary  to  have  some  advance  information 
on  the  future  resource  requirements  of  processes. 

Empirical  observations  have  suggested  that,  to  a  great 
extent,  deadlock  prevention  mechanisms  undercommit  resources 
while  the  detection  techniques  give  away  resources  so  freely 
that  deadlock  or  near-deadlock  situations  arise  frequently. 
Avoidance  schemes  select  a  mid-way  approach  between  highly 
conservative  prevention  mechanisms  and  very  liberal 
detection  techniques. 

Avoidance  schemes  do  not  require  preemption,  but  do 
need  knowledge  of  the  future.  This  knowledge  may  be  in  the 
form  of  upper  bounds  on  the  quantity  of  each  resource  group 
required  by  the  process.  If  the  upper  bounds  are  generous, 
the  resources  are  used  inefficiently.  Obtaining  technically 
good  upper  bounds  is  difficult  in  the  avoidance  approach. 

In  a  heavily  loaded  system  with  most  resources  allocated, 
there  are  very  few  safe  allocations  for  the  outstanding 
requests.  This  may  lead  to  processes  getting  blocked  for 
long  periods  while  holding  useful  resources. 

In  both  prevention  and  avoidance  of  deadlock  cases, 
recovery  from  systems  implementation  programming  error  needs 
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a  rollback  mechanism.  A  complete  pictorial  view  of  these 
basic  strategies  and  their  comparison  is  depicted  in  Tables 
2 . 1  and  2.2. 

Igno ring  Deadlocks : 

One  less  novel  approach  to  handling  deadlocks  is  to 
ignore  them.  Such  strategies  have  been  referred  as  an 
"indifferent  strategy"  [Isloor  and  Marsland,  19781  or  a  "no 
strategy  strategy"  [Holt,  1972].  In  contrast  to  detection, 
prevention,  or  avoidance  methods,  ignoring  deadlocks  saves 
the  overhead  involved  in  the  maintenance  of  algorithms.  In 
such  a  case,  the  onus  of  recognizing  deadlocks  is  borne  by 
either  a  shrewd  computer  operator  discovering  blocked 
processes,  or  a  skilled  user  waiting  for  an  answer. 

Ignoring  deadlocks  can  have  disastrous  effects  on  the 
consistency  of  data  in  both  distributed  and  centralized 
database  systems. 

2.4  Models  of  Deadlock 

In  considering  the  deadlock  problem,  our  interest  in 
this  section  lies  in  the  representation  of  process 
interactions  while  allocating  resources  to  processes. 
Processes  in  computer  systems  can  be  dynamic  in  the  sense 
that  one  process  may  create  another.  Processes  are  said  to 
interact  expl ic i tly  when  they  inter-communicate  among 
themselves,  using  messages  created  by  the  sending  processes 
and  consumed  by  the  receiving  processes.  Process 
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interactions  as  a  result  of  competition  for  access  to 
physical  objects  are  termed  impl ic i t .  Either  explicit  or 
implicit  interactions  can  lead  to  the  blocking  of  processes. 
Holt  has  proposed  the  distinction  of  "reusable  resources" 
and  "consumable  resources"  to  model  implicit  and  explicit 
interactions  respectively  [Holt,  1971,  19721. 

The  physical  devices  of  a  computer  system,  such  as  tape 
drives,  disks,  memory,  channels,  are  reusable  resources. 
There  is  always  a  fixed  total  number  of  units  of  these 
resources  in  any  system.  Any  unit  of  a  particular  resource 
can  be  held  by  one  process  at  a  time.  Thus,  each  unit  of  a 
resource  is  either  free  for  allocation  or  is  held  by  a 
process.  The  allocation  strategies  specify  the  unit  of  the 
resource.  For  example  memory  may  be  allocated  by  pages, 
disks  may  be  held  in  units  of  tracks  or  entire  disks,  and 
data  may  be  assigned  as  records,  fields,  entities,  or  files. 

Message  text  from  operators,  external  interrupts, 
inter-communication  between  processes,  and  card  images 
produced  by  a  card  reader  are  examples  of  consumable 
resources.  Typically,  the  total  number  of  resource  units  is 
not  fixed.  When  acquired  by  a  process,  every  available  unit 
of  a  resource  ceases  to  exist.  A  process  which  creates  a 
consumable  resource  must  be  treated  as  if  it  were  holding 
the  resource.  Such  a  creator  of  a  resource  may  release  any 
number  of  units  of  the  resource.  Consumable  resources  are 
created  and  released  by  a  producing  process,  and  are 
requested,  acquired  and  used  by  other  processes.  Whereas, 
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reusable  resources  are  assigned  by  the  resource  manager  to 
requesting  processes  which,  after  acquiring  and  using  them, 
return  them  to  the  manager. 

Many  resources,  such  as  tape  drives  and  printers, 
permit  only  exclusive  use  by  one  process  at  a  time  but 
others,  like  data  resources  and  read-only  programs,  may 
allow  shared  use  by  several  processes.  Another 
character ist ic  of  a  resource  is  the  possibility  of 
preemption.  Some  resources  may  be  taken  back  by  the  system 
without  any  action  by  the  process.  Such  a  process  is  then 
either  aborted,  rolled  back  (if  necessary)  and  restarted,  or 
forced  to  rerequest  and  thus  wait  for  the  preempted 
resource.  The  cost  of  aborting  or  restarting  the  process 
accounts  for  the  inherent  losses  due  to  preemption.  In 
certain  cases,  it  is  possible  to  suspend  the  process  and 
preempt  the  resource,  yet  preserve  the  current  states  of  the 
process  and  its  use  of  that  resource  for  a  later  resumption. 
Such  resumption  does  not  lose  processing  time  already  spent. 
Typical  examples  of  resumption  are: 

(a)  CPU  interrupts,  in  which  the  resource  preempted  is  the 
CPU.  The  information  that  must  be  preserved  for  later 
restoration  is  the  status  of  the  process  (for  example 
registers  and  the  "program  status  word");  and 

( b)  swapping,  in  which  primary  memory  is  the  preempted 
resource  and  backup  is  provided  in  secondary  storage. 

When  the  system  has  different  resource  types  and  more 
than  one  resource  of  the  same  type,  the  degree  of  complexity 
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of  the  deadlock  problem  increases.  Attempts  to  model  and 
formalize  the  problem  have  resulted  in  two  fairly 
interesting  major  proposals  [Coffman  e_t  al.,  1971;  Holt, 
19721.  In  these  proposals,  process  interactions  have  been 
represented  by  using  g raph-theo retie  models,  and  deadlocks 
have  been  expressed  precisely  in  terms  of  graphs. 

Given  a  set  of  processes  and  a  set  of  distinct 
resources  in  use  by  these  processes,  Coffman  e_t  al .  define  a 
state  graph  as  a  directed  graph  whose  nodes  correspond  to 
the  resources  and  whose  edges  are  defined  as  follows:  At  any 
given  instant  of  time,  there  exists  an  edge  directed  from  a 
resource  node  to  a  resource  node  R_.  ,  provided  some 
process  P  has  access  to  R^  and  has  requested  access  to  R^  . 

It  has  been  shown  that  a  directed  circuit  in  the  state  graph 
is  a  necessary  and  sufficient  condition  for  deadlock. 

Exampl e  2.^:  Let  Pi'  P2'  P3  an<^  Ri  *  R2'  R3 
respectively  the  processes  and  resources  in  the  system.  Let 
R^  be  held  for  shared  access  by  both  and  P^,  and  let  P^ 
be  waiting  for  exclusive  access  to  R^ .  Assume  that  R^  and 
r^  are  held  for  exclusive  access  by  P^  and  P^  respectively, 
while  P^  and  P^  respectively  wait  for  exclusive  access.  The 
process  interactions  here  can  be  represented  by  the  state 
graph  shown  in  Figure  2.3.  The  existence  of  a  directed 
circuit  involving  three  nodes  in  the  state  graph  clearly 
means  that  there  exists  at  least  three  deadlocked  processes. 
For  clarity  and  better  understanding,  we  have  labeled  the 
edges  in  the  state  graph.  For  instance,  the  edge  directed 
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from  R,  to  R  is  labeled  sP  e  indicating  that  P  holds  R 
*  ^  *  -  2  1 

for  shared  access  and  is  waiting  for  exclusive  access  to  R^ , 
and  so  on. 


eP^e 

Figure  £._3:  State  Graph  for  Example  2.1. 

For  the  general  case,  with  more  than  one  unit  of  some 
resources,  the  state  graph  defined  above  is  inappropriate. 
Coffman  e_t  aJL.  propose  that  the  resources  be  partitioned 
into  different  types,  in  which  resources  of  a  given  type  are 
identical.  The  nodes  in  the  state  graph  then  represent 
resource  types.  A.  directed  edge  in  the  graph  exists  between 
a  node  representing  one  resource  type  to  another,  whenever 
any  process  has  acquired  access  to  at  least  one  unit  of  the 
former  resource  type  and  has  requested  access  to  at  least 
one  unit  of  the  latter  type.  A  directed  circuit  in  such  a 
generalized  state  graph  is  still  a  necessity  for  deadlock 
existence.  However,  it  is  not  sufficient. 

Holt's  model  of  a  system  of  interacting  processes  is  a 
classical  piece  of  work,  which  is  thorough  and  comprehen¬ 
sive.  The  character istic  of  the  approach  is  the  use  of  a 
"general  resource  system"  which  models  reusable  as  well  as 
consumable  resources.  A  general  resource  g raph  is  defined 
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as  a  bipartite  graph  whose  disjoint  sets  of  nodes  are  the 
set  of  processes  and  the  set  of  resources  (reusable  and 
consumable).  The  set  of  resources  is  associated  with  an 
avail  able  un its  vecto  r  whose  elements  are  integers  of  the 
quantity  of  resources  available.  Edges  directed  from  a 
process  node  to  a  resource  node  are  termed  request  edges . 
Edges  directed  from  reusable  and  consumable  resource  nodes 
to  processes  are  called  assignment  and  producer  edges 
respectively.  A  process  is  blocked  if  and  only  if  the 
number  of  request  edges  from  this  process  to  a  particular 
resource  exceeds  the  number  of  available  units  of  the 
resource.  A  process  is  deadlocked  when  it  is  impossible  to 
get  the  process  out  of  the  blocked  state.  Tn  that  approach, 
a  "graph  reductions"  method  is  introduced  to  check  if  a 
process  is  deadlocked.  A  graph  reduction  corresponds  to  the 
best  set  of  operations  a  particular  process  can  execute  to 


Figure  2.4:  General  resource  graph  for  the  traffic 
deadlock  of  Figure  2.1. 

It  is  shown  that  for  a  general  resource  graph  in  which 
all  processes  having  requests  are  blocked,  the  existence  of 
a  directed  circuit  is  a  necessary  and  sufficient  condition 
for  deadlock.  It  is  further  derived  that  for  a  general 
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resource  system  with  single  unit  requests,  a  deadlock  exists 
if  and  only  if  there  is  a  cycle  in  the  graph.  A  simple 
illustration  of  a  general  resource  graph  corresponding  to 
the  traffic  deadlock  of  Figure  2.1  is  shown  in  Figure  2.4 
(in  the  situation  when  all  the  cars  intend  going  straight). 

It  is  evident  from  Figure  2.4  that  the  cars  A,  B,  C  and 
D  are  deadlocked.  Intuitively,  a  necessary  and  sufficient 
condition  for  the  existence  of  deadlock  is  that  there  exists 
a  circular  chain  of  processes  in  which  each  process  holds 
exclusive  and  non-preempt ible  control  of  some  resource,  and 
is  requesting  access  to  at  least  one  resource  held  by  the 
next  process  in  the  circular  chain  of  waiting  processes. 

2.5  Comparison  and  Contrast  of  Deadlock  Problem  in  the  Three 
Fields 

Considerable  research  has  been  done  on  the  deadlock 
problem  in  operating  systems,  and  is  surveyed  in  Section 
2.^.1.  Briefly,  three  broad  categories  of  algorithms  have 
been  proposed: (i)  detection,  (ii)  avoidance,  and  (iii) 
prevention.  It  has  been  shown  that  in  certain  cases,  it  is 
possible  to  suspend  the  process  and  preempt  a  resource,  yet 
preserve  the  current  states  of  the  process  and  its  use  of 
that  resource  for  a  later  resumption. 

Even  though  the  general  principles  for  deadlock 
handling  that  have  been  developed  for  operating  systems  are 
applicable  in  database  systems,  several  additional  problems 
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arise.  The  resources  which  processes  may  wish  to  lock  for 
exclusive  use  include  pieces  of  shared  data.  Very  often  the 
processes  may  issue  lock  requests,  the  locking  criterion  for 
which  depends  on  data  values  such  as:  "LOCK  THE  EMPLOYEE 
RECORDS  OF  ALL  EMPLOYEES  IN  THE  SYSTEMS  PROGRAMMING  DEPT." 
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deadlock  problem. 

An  autonomous  component  behaviour  is  the  main 
characteristic  of  operating  systems  for  a  network  of 
computers,  which  greatly  aggravates  the  control  problems. 
Presence  of  appreciable  time-lags  renders  synchronizing  the 
actions  of  the  various  controllers  in  the  system  much  more 
difficult.  Moreover,  the  deadlock  problem  in  distributed, 
systems  is  somewhat  different,  since  in  geographically 
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distributed  databases  all  information  needed  to  detect 
deadlocks  is  not  necessarily  available  at  any  single 
installation.  Consider  the  example  of  Figure  2.2,  but  with 
two  separate  computers  and  C as  shown  in  Figure  2.5. 
Assume  that  the  process  P  and  resource  entity  M  reside  at  C^ 
and  process  Q  and  resource  entity  N  reside  at  C? 
respectively.  Processes  P  and  Q,  after  updating  local  data 
resources  M  and  N,  arrive  at  remote  locations  and  get 
involved  in  a  deadlock  which  neither  computer  C-^  nor  can 
detect,  based  on  the  information  available  at  their 
respective  installations. 


Figure  2.5:  Distributed  database  deadlock: 

ExampTe  of  Figure  2.2  in  a  distributed  environment. 


Communication  delays  may  lead  to  synchronization 
problems  in  obtaining  an  accurate  view  of  the  status  of  the 
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computer  network.  Typically,  an  existing  deadlock  may  not 
be  detected,  or  a  deadlock  may  be  indicated  where  one  no 
longer  exists.  Synchronizing  the  updates  of  multiple-copy 
files  is  non-trivial.  Also,  abortion,  rollback  and  recovery 
become  very  involved  and  require  further  communication. 

2.5  Survey  of  Deadlock  Handling  Schemes 

Many  different  deadlock  handling  algorithms  and 
approaches  that  have  been  proposed  for  operating  systems, 
database  systems  and  distributed  databases  are  presented. 

2.5.1  Operating  Systems 

A  plethora  of  research  papers  has  appeared  on  deadlocks 
in  operating  systems.  The  most  notable  among  them  are 
discussed  below. 

The  approaches  suggested  by  Havender T19581  exclude  a 
priori  the  possibility  of  deadlock  by  putting  certain 
constraints  on  the  way  in  which  requests  for  resources  may 
be  made.  All  the  required  resources  must  be  requested  and 
granted  before  the  process  can  proceed.  In  a  second 
strategy,  when  a  process  holding  certain  resources  is  denied 
a  further  request ,  the  process  must  release  all  of  its 
original  resources  and  rerequest  them  together  with  the 
additional  ones.  The  third  strategy  utilizes  the  principle 
of  resource  ordering.  The  capabilities  of  these  three 
schemes  have  been  discussed  in  Section  2.3. 
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The  Bankers  algorithm  [Habermann,  19691  is  a  practical 
example  of  avoidance.  The  approach  uses  the  "Maximum  Claims 
Strategy"  with  regard  to  information  about  the  future 
resource  requirements  for  each  process.  This  information  is 
provided  in  the  form  of  an  upper  bound  on  the  quantity  of 
each  resource  group  that  will  be  used  by  each  process.  A 
state  is  considered  safe  if  there  is  a  process  which  can  run 
to  completion  using  only  the  available  resources  and  those 
already  allocated  to  it.  A  state  further  derived  from  such 
a  state  is  also  safe.  Deadlock  avoidance  is  achieved  by 
testing  each  possible  allocation  and  making  only  those  which 
lead  to  safe  states.  If  the  process  which  is  the  originator 
of  this  request  can  run  to  completion  and  release  the 
resources  it  holds,  then  all  other  processes  in  the  system 
can  be  completed,  since  the  state  previous  to  the  request 
was  safe. 

Hebal ka r [1 970 ]  uses  a  graph  model  to  represent 
processes  of  more  general  structure  (asynchronous  systems) 
than  those  considered  by  Habermann.  Tn  that  model,  nodes 
represent  transitions  of  a  computation  and  edges  represent 
the  demand  vectors  of  resources.  A  computation  splits  into 
parallel  subcomputations,  which  can  merge.  The  state  of  a 
system  can  thus  be  represented  by  a  cut-set  of  a  graph. 

Safe,  as  well  as  deadlock  states  can  be  represented  in  this 
model.  Advance  information  in  the  form  of  demand  vectors  of 
resources  in  the  graph  model  has  been  used  to  design 
algorithms  to  prevent  deadlocks. 
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Coffman  e_t  aj^. ,  use  the  model  illustrated  in  Section 
2.4  to  provide  a  deadlock  detection  and  recovery  scheme  for 
the  case  of  more  than  one  resource  of  a  given  type.  Such  a 
method  needs  a  more  elaborate  state  description  mechanism 
than  a  state  graph,  and  is  supplemented  with  "allocation" 
and  "request"  matrices  and  an  "available  resources  vector". 
An  algorithm  is  designed  which  uses  these  data  structures  to 
discover  a  deadlock  by  simply  investigating  every  possible 
sequence  of  remaining  processes  to  be  completed.  This 
algorithm  precisely  identifies  the  complete  set  of 
deadlocked  processes,  and  runs  in  time  proportional  to  the 
square  of  the  number  of  processes. 

A  further  algorithm  [Coffman  e_t  aJL. ,  19711  for  deadlock 
recovery  is  suggested,  by  preemption  of  a  subset  of 
resources  that  would  incur  a  minimum  cost.  An  efficient 
branch-and-bound  tree-search  technique  is  utilized  to  find  a 
minimum  cost  solution.  This  algorithm  facilitates  the 
inclusion  of  preemptible  resource  types  of  varying 
preemption  costs  in  the  notion  of  deadlock.  Thus,  the 
recovery  technique  is  designed  to  avoid  preemption  of 
resources  with  high  inherent  losses.  An  informal  discussion 
of  certain  theoretical  aspects  of  deadlock  avoidance  using 
information  on  resource  requirements  is  also  provided. 

A  comprehensive  proposal  for  deadlock  handling  [Holt, 
19721,  is  dealt  with  in  detail  in  Section  2.4.  Holt 
suggests  an  efficient  deadlock  detection  algorithm  for  the 
special  case  of  general  resource  systems  with  single  unit 
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requests.  The  algorithm  tests  the  general  resource  graph 
for  the  existence  of  a  cycle.  This  is  achieved  by 
determining  if  the  successive  elimination  of  all  sink  nodes 
produces  predecessors  which  are  also  sinks.  A  sink  node  is 
one  with  no  edges  emanating  from  it  (i.e.  no  wait  requests). 
All  nodes  will  become  sinks  if  and  only  if  the  graph  is 
devoid  of  a  cycle.  The  mechanism  is  a  simplified  version  of 
successive  graph  reduction. 

An  effective  algorithm  [Holt,  19721  is  used  to 
determine  if  a  particular  blocked  process  is  deadlocked. 

The  algorithm  systematically  traces  out  all  paths  leading  to 
the  corresponding  process  node.  A  path  which  leads  to  a 
sink  exists  if  and  only  if  the  process  was  not  deadlocked. 
Both  of  these  algorithms  execute  in  time  proportional  to  the 
number  of  edges  in  the  graph.  A  weighted  edge  is  used  to 
represent  allocation  of  multiple  units  of  a  resource  to  a 
process . 

The  relationship  of  graph  reducibility  in  consumable 
resource  systems  and  the  security  from  deadlock  of  such  a 
system  is  investigated.  A  deadlock  detection  algorithm  for 
reusable  resource  systems  is  presented.  The  technique  of 
graph  reductions  is  used  to  improve  Habermann's  original 
deadlock  prevention  algorithm.  The  importance  of  graph 
reductions  in  investigating  the  deadlock  problem  is 
emphasized  [Holt,  1972]. 
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2.5.2  Database  Systems 

Various  aspects  of  concurrent  operation  of  database 
processes  have  been  the  topics  of  active  research  in  the 
recent  past.  Several  reports  have  dealt  with  the  deadlock 
problem  in  database  systems,  the  notable  among  them  being 
surveyed  below. 


Shoshani  and  Bernste in [ 1 959 1  assume  that  a  database  can 
be  represented  by  a  graph,  with  nodes  representing 
collections  of  information.  A  directed  edge  exists  between 
two  nodes,  if  one  node  contains  the  address  of  the  other. 
Such  a  database  is  assumed  to  be  accessed  by  a  set  of 
routines  called  primitives.  Under  concurrent  operation,  two 
or  more  primitives  may  conflict  with  each  other  while 
accessing  a  node.  To  overcome  these  conflicts  a  procedure 
called  the  "walking  algorithm"  was  formulated.  It  requires 
that  (i)  a  primitive  lock  the  next  node  it  wishes  to  access 
before  unlocking  the  node  it  is  currently  accessing,  and 
(ii)  it  keep  a  node  locked  only  during  the  time  of  its 
accessing.  Further,  a  method  called  the  two-step  algorithm 
is  developed  to  overcome  the  drawbacks  (including  deadlock 
avoidance)  of  the  walking  algorithm. 


The  LOCK-UNLOCK  mechanism  of  the  CODASYL  approach  to 
data  management,  which  enables  incremental  allocation  of 
data  resources  to  processes,  is  described  by  King  and 
Col lmeyer [ 1 973 1 .  A  database  access  state  graph  is  used  to 
define  the  state  of  all  accesses  to  a  database.  Thus  the 
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LOCK-UNLOCK  mechanism  can  be  modeled  by  state  transition 
functions  that  map  access  state  graphs  to  access  state 
graphs.  The  operations  of  the  functions  LOCK,  UNLOCK, 
ALLOCATE  and  DEALLOCATE  are  modeled.  Further,  it  is  shown 
that  the  ALLOCATE  function  is  the  only  one  that  can 
precipitate  a  deadlock.  A  necessary  and  sufficient 
condition  for  the  existence  of  a  deadlock  is  derived,  in 
terms  of  the  effect  of  the  ALLOCATE  function.  A  scheme  is 
devised  to  detect  a  deadlock  using  the  fact  that  one  can 
occur  only  when  an  allocation  request  results  in  a  process 
being  blocked.  In  the  event  of  a  deadlock,  a  recovery 
scheme  is  suggested.  A  major  shortcoming  in  the  approach  is 
that  a  process  cannot  have  more  than  one  outstanding 
request.  We  have  stated  in  Chapter  1  that  in  real  world 
applications  it  is  a  very  common  necessity  for  a  process  to 
have  at  one  time  more  than  one  outstanding  request. 


One  proposed  technique  [Chamberlin  e_t  a_l.  ,  1974]  is  a 
shrewd  modification  and  combination  of:  (i)  trying  to 
preclaim  needed  resources;  (ii)  if  preclaiming  data 
resources  leads  to  a  deadlock,  preempting  data  resources; 
and  (iii)  imposing  a  presequencing  scheme  for  processes  by 
time  stamping,  to  avoid  deadlock  due  to  indefinite  delay. 

The  method  requires  each  process  to  lock  all  of  its  data 
resources  during  a  "seize  phase",  before  starting  the 
"execution  phase".  The  concurrent  running  of  the  seize 
phases  of  processes  raises  the  deadlock  problem.  During  the 
seize  phase,  preemption  from  a  process  of  locked  resources 
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is  possible,  and  backing-up  a  process  to  the  start  of  its 
seize  phase  is  easy.  Once  in  its  execution  phase,  a  process 
is  not  allowed  to  claim  more  resources.  The  end  of  an 
execution  phase  is  first  signalled  by  the  release  of  all 
resources  held,  followed  by  the  possible  starting  of  a  new 
seize  phase.  An  age  indicator  attached  to  the  processes  is 
used  to  avoid  deadlock  due  to  indefinite  delay  of  processes, 
thus  the  method  is  deadlock-free.  A  formal  proof  of 
correctness  of  the  deadlock-free  property  of  the  scheme  has 
also  been  demonstrated.  This  algorithm  makes  the  dynamic 
resource  allocation  based  on  previous/current  calculations 
ineffective. 

One-level  lockout  and  two-level  lockout  mechanisms 
[Schlageter,  1975]  are  considered  for  synchronizing  database 
access.  In  the  one-level  lockout  scheme,  the  readers  can 
access  the  database  at  any  time,  regardless  of  the 
allocation  state,  whereas  the  writers  are  required  to  lock 
the  data  resources.  In  the  two-level  lockout  scheme, 
readers  may  share  the  same  data,  but  have  the  right  to 
prevent  writers  from  accessing  this  data.  This  is 
implemented  by  using  primitives  LOCKR  (read)  and  LOCKW 
(write) .  Data  locked  by  LOCKR  can  be  accessed  by  any 
reader.  Data  locked  by  LOCKW  can  be  accessed  by  readers 
which  do  not  need  to  be  protected  against  changes  of  data. 
Presence  of  a  cycle  in  the  process  graph  is  shown  as  a 
necessary  and  sufficient  condition  for  deadlock  existence. 
Given  the  process  graph,  the  deadlock  detection  procedure 
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starts  at  the  blocked  process  node  and  tests  if  a  path 
reaches  the  process  node  again.  Under  the  two-level  lockout 
mechanism,  starting  at  a  blocked  process  node  and  traversing 
paths  to  detect  deadlocks  is  no  longer  simple. 

One  scheme  proposed  for  deadlock  avoidance  in  database 
systems  [Lomet,  1977]  requires  the  processes  to  pre-declare 
their  anticipated  resource  requirements,  with  the  system 
granting  only  safe  requests.  The  algorithm  is  tailored  to 
the  needs  of  database  systems,  unlike  the  approaches  of 
Habermann [ 1989] ,  and  Holt[1972].  A  series  of  time-varying 
graph  representations  is  defined  for  database  interactions. 

A  " holds-cl a ims  graph"  represents  only  those  processes  that 
are  currently  making  a  claim  on  some  common  system 
resources.  A  " cl aims-cla ims  graph"  represents  the  processes 
which  are  potentially  capable  of  denying  resources  to  one 
another,  their  claims  being  in  contention.  A  "holds-holds 
graph"  represents  the  allocation  status  of  the  system.  A 
deadlock  exists  if  and  only  if  there  is  a  cycle  in  the 
holds-holds  graph,  whereas  a  deadlock  can  be  avoided  if  and 
only  if  the  holds-claims  graph  is  cycle-free.  A  deadlock 
avoidance  scheme  has  been  devised  which  performs  appropriate 
actions  on  claims-claims ,  holds-claims,  and  holds-holds 
graphs  in  the  event  of  process  entry/deletion,  and  resource 
request/release.  Even  though  the  possibility  of  indefinite 
delay  is  not  entirely  eliminated,  it  is  reduced  as  a  result 
of  the  strategy  of  incrementally  granting  requests. 
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2.6.3  Distributed  Databases 

Relatively  little  work  has  been  reported  on  deadlock  in 
distributed  systems.  A  thorough  discussion  of  all  earlier 
attempts  to  handle  deadlocks  in  distributed  databases,  and 
their  drawbacks,  is  provided.  A  new  approach  suggested  by 
us  is  dealt  with  in  detail  in  Chapter  3. 

Prevention  of  deadlocks  in  distributed  databases  has 
been  the  subject  of  papers  by  Chu  and  Ohlmacher ri974 1  and 
Maryanski [ 1 977 ] .  in  their  first  approach  Chu  and  Ohlmacher 
require  that  all  data  resources  be  allocated  to  the 
processes  before  initiation,  which  in  turn  needlessly  delays 
the  processes.  Their  second  technique  is  based  on  the 
concept  of  a  process  set,  which  is  a  collection  of  processes 
with  access  to  common  data  resources.  A  process  is  allowed 
to  proceed  only  if  all  data  resources  required  by  the 
process  and  the  members  of  its  process  set  are  available. 

In  Maryanski' s  proposal,  each  process  has  to  communicate  its 
shared  data  resources  list  (conceptually  similar  to  a 
process  set)  to  all  other  processes  before  it  can  proceed. 
This  shared  data  resources  list  is  determined  by  using  what 
is  called  a  process  profile,  which  contains  a  list  of  data 
resources  that  can  be  updated  by  the  process.  The 
communication  and  computation  of  process  sets  (or  shared 
data  resources  list),  which  is  performed  continually  as 
processes  enter  and  leave  the  system,  makes  heavy  demands  on 
the  system. 
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Techniques  for  deadlock  detection  in  a  network 
environment  [Chandra  et  al.,  1974;  Mahmoud  and  Riordon, 

1976,  1977;  Goldman,  1977]  have  been  proposed.  At  each 
installation,  Chandra  e_t  require  the  maintenance  of  a 

resource  table  which  contains  information  pertaining  to 
local  resources  allocated  to  processes,  processes  waiting 
for  access  to  local  resources,  remote  resources  allocated  to 
local  processes,  and  local  processes  waiting  for  access  to 
remote  resources.  The  type  of  access  requested  by  the 
process  is  also  stored.  They  hypothesized  that  by  using 
such  tables,  well  known  algorithms  for  detecting  deadlocks 
in  a  single  system  could  be  extended  to  detect  deadlocks  in 
a  network  of  computers,  by  communication  between 
instal lat ions  and  appropriate  expansion  of  resource  tables. 
Schemes  to  expand  resource  tables  in  a  network  environment 
were  included.  However,  Goldman  has  shown  an  example  in 
which  a  deadlock  is  not  detected  by  their  proposed  scheme. 


The  Centralized  Control  approach  to  deadlock  detection 
in  distributed  databases  of  Mahmoud  and  Riordon  creates  an 
overall  picture  of  the  global  network  status  by  using  file 
queues  and  pre-test  queues  (a  queue  of  requests  which  can 
only  be  granted  at  a  future  time)  received  from  all  other 
installations  in  the  network.  As  the  identifiable  unit  of 
object-data  becomes  smaller  in  size,  message  congestion  at 
the  control  node  increases  to  degrade  the  network 
performance.  The  authors  also  propose  a  Distributed  Control 


approach  in  which,  in  a  network  of  'n'  computers,  each 
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installation  transmits  (n-1)  identical  messages  containing 
status  and  queues  of  files.  Each  installation  thus  receives 
(n-1)  different  messages.  As  shown  by  Goldman  this  approach 
has  a  flaw,  in  which  a  certain  deadlock  goes  undetected.  In 
any  case,  all  these  proposals  require  the  communication  of 
large  tables  between  installations. 

The  Detection  schemes  of  Goldman  are  based  on  the 
creation  and  expansion  of  an  Ordered  Blocked  Process  List 
(OBPL).  An  OBPL  is  a  list  of  processes  in  which  each 
process  except  the  last  one  is  waiting  for  a  data  resource 
held  by  the  next  process  in  the  list.  Whenever  an  OBPL  is 
transmitted  between  installations,  a  data  resource  name  is 
inserted  into  the  "single  data  resource  identification  part" 
of  the  OBPL.  The  last  process  in  the  list  either  has  access 
to,  or  is  waiting  for,  that  resource.  In  the  former  case 
the  state  (blocked  or  active)  of  the  last  process  in  the 
OBPL  must  be  determined,  while  in  the  latter  case  one  needs 
to  know  the  state  of  the  process  which  holds  the  data 
resource.  Goldman  proposes  techniques  to  determine  these 
states  and  to  eventually  detect  deadlock  (if  any). 

Even  though  Goldman's  method  seems  sound,  it  does  have 
some  shortcomings.  For  instance,  no  process  may  have  more 
than  one  outstanding  resource  request,  which  is  not 
generally  the  case  in  real  world  situations,  as  illustrated 
in  Appendix  B.  Also,  when  several  readers  share  access  to  a 
data  resource,  Goldman  requires  the  creation  and  expansion 
of  one  different  copy  of  the  OBPL  for  each  reader;  since  if 
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one  of  the  readers  is  deadlocked,  then  so  is  any  process 
which  requests  access  to  that  resource.  It  is  possible  that 
OBPL’s,  while  undergoing  expansion,  could  be  transferred 
(sequentially)  among  several  installations  or  several  times 
between  the  same  two  installations  before  a  deadlock  is 
detected.  Furthermore,  OBPL's  could  become  large,  leading 
to  substantial  overhead,  especially  when  records  or  entities 
are  considered  as  data  resources  instead  of  files. 


The  primary  disadvantage  of  all  the  existing  methods  is 
that  they  cannot  recognize  that  deadlock  is  imminent  without 
substantial  communications  between  the  computers  in  the 
network.  Thus,  the  algorithms  described  above  cannot  be 
used  effectively  for  on-line  detection  since  they  are  very 
susceptible  to  "synchronization  error",  in  which  either  a 
deadlock  is  indicated  where  one  no  longer  exists  or  a 
deadlock  occurs  and  is  not  recognized  -  when  two  autonomous 
computers  concurrently  allocate  resources  before  advising 
each  other  of  their  actions. 


2.7  Combined  or  Mixed  Approach  to  Deadlock  Handling 

It  is  hypothesized  [Howard,  1973]  that  none  of  the 
three  basic  approaches  alone  -  detection,  avoidance, 
prevention  -  suits  the  complete  set  of  resource  allocation 
problems  in  operating  systems.  Instead,  different 
subproblems  of  resource  allocation  can  be  optimally  handled 
by  different  individual  techniques.  At  the  same  time,  all 
the  techniques  can  cooperate  globally  to  prevent  deadlocks. 
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The  basis  for  the  method  lies  in  the  hierarchical  structure 
of  operating  systems.  Resource  ordering  in  a  hierarchical 
structure  provides  the  framework  for  a  mixed  technique.  In 
many  well-designed  operating  systems,  several  levels  of 
software  are  provided.  Each  level  modifies  and  extends  the 
capabilities  provided  by  the  underlying  level.  The 
implementation  of  an  operation  in  a  higher  level  is  achieved 
by  creating  a  lower  level  process.  Such  a  created  process 
performs  the  actions,  obtains  the  results  and  indicates  its 
completion  through  a  message  to  the  higher  level  process. 
Such  messages  are  treated  as  consumable  resources  by 
deadlock  handling  mechanisms.  In  the  case  of  resource 
ordering,  this  implies  that  a  higher  level  process  cannot 
hold  any  resources  required  by  a  lower  level  process  it 
creates.  Such  a  restriction  automatically  enforces  a 
"natural"  ordering  of  resources  in  a  hierarchical  system. 
Resources  required  by  lower  level  processes  should  appear 
later  in  the  ordering.  The  ordering  is  also  applied  to 
classes  of  resources. 

Howard  provides  the  following  example  involving  the  CDC 
6600  in  use  at  the  University  of  Texas  at  Austin.  The 
system  resources  are  classified  into  the  following 
categories : 

(i)  space  in  the  swapping  store; 

(ii)  assignable  devices  such  as  tape  drives,  and  job 
resources  such  as  access  to  files; 

(iii)  central  memory  for  user  jobs;  and 
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(iv)  internal  resources  such  as  memory  for  transient  system 
overlays,  and  channel  and  controller  access. 

A  combined  approach  to  the  deadlock  problem  in  the  above 
system  works  as  follows:  Pr eal loca t ion  of  the  maximum 
requirement  for  each  process  seems  to  be  the  most  practical 
method  for  the  swapping  space.  For  the  job  resources, 
avoidance  is  the  most  reasonable  solution  since  it  is 
usually  possible  to  obtain  considerable  information  on  the 
future  resource  requirements  of  a  job  from  its  control 
cards.  Detection  and  preemption  are  not  necessarily 
suitable  due  to  possible  file  updates.  Prevention  using 
preemption  is  the  logical  preference  for  central  memory, 
provided  the  system  is  capable  of  swapping.  Detection  is 
also  a  possibility,  but  is  not  recommended  if  prevention  is 
also  possible.  Prevention  in  the  form  of  resource  ordering 
is  the  best  choice  for  internal  system  resources.  The 
hierarchical  nature  of  internal  system  structures  usually 
ensures  a  natural  resource  ordering.  Even  though, 
theoretically  it  is  preferable  to  use  a  single  algorithm,  a 
combined  approach  has  practical  advantages  in  its  favour. 

A  comprehensive  combined  approach  to  deadlock  handling 
in  database  systems  or  distributed  databases  has  not  been 
devised  so  far.  However,  the  idea  of  mixed  solutions  has 
been  applied  in  our  approach  (Chapter  3)  to  on-line 
detection  in  distributed  databases,  when  there  are  multiple 
outstanding  requests  on  a  data  resource  released  by  a 
completing  process.  In  this  case  of  a  deadlock-free  system. 
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an  indiscreet  resource  allocation  decision  can  potentially 
lead  to  deadlock.  We  will  be  showing  that  there  exists  at 
least  one  process  in  the  set  of  waiting  processes  such  that 
the  allocation  of  the  resource  to  this  process  maintains  the 
system  deadlock-free.  In  other  words,  a  potential  deadlock 
is  detected  and  avoided  accordingly.  The  approach  utilizes 
both  the  detection  as  well  as  the  avoidance  principles. 

2.8  Deadlock  Problem  vis-a-vis  Game  Playing 

The  deadlock  avoidance  problem  has  been  informally 
defined  as:  "From  some  a  priori  information  about  the 
processes,  the  resources,  the  operating  system,  etc., 
determine  what  situations  may  be  realized  without 
endangering  the  smooth  running  of  the  system"  [Devillers, 
19771.  These  situations  are  termed  safe,  and  the  others 
unsa  f e . 

The  game-playing  model  of  the  deadlock  situation  has 
not  received  much  attention,  perhaps  because  it  appears  to 
be  practical  only  for  small  problems.  Nevertheless  it  does 
provide  a  valuable  alternative  viewpoint.  The  general- 
approach  taken  by  Dev il ler s [ 1 977 ]  is  to  recognize  that  the 
resource  allocation  problem  is  effectively  a  zero  sum  game 
in  which  resource  manager  (RN)  is  competing  against 
independent  processes  demanding  service.  In  this  case  the 
manager  may  be  thought  of  as  winning  if  all  the  processes 
complete  successfully,  while  the  opposition  wins  if  they 
create  a  deadlock.  A  flow  chart  model  which  in  some  sense 
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generalizes  Habermann's  Maximum  Claims  model  and  Hebalkar's 
Task  Step  model,  is  proposed.  Unfortunately  with  this  model 
it  is  not  easy  to  define  the  safe  states,  not  only  because  a 
process  may  have  more  than  one  possible  series  of  steps  that 
it  may  traverse  (future  history),  but  also  because  there  may 
not  exist  a  simple  worst  possible  future  history  for  each 
process.  To  overcome  this  problem  Devillers  proposes  a 
global  approach.  During  the  execution  of  a  system  the 
processes  pass  through  various  states.  For  the  deadlock 
problem  three  types  of  states  are  considered. 

(a)  The  "working  state"  representing  the  state  of  being  in 
some  process  step. 

(b)  The  "transit  state"  in  which  some  process  has  completed 
one  step  and  is  waiting  the  resource  manager’s 
permission  to  enter  the  next  step  of  the  process. 

(c)  The  "terminated  state"  which  represents  a  process  that 
has  completed  all  the  steps  in  its  task. 

The  states  of  the  system  can  thus  be  character i zed  by  the 
status  of  the  various  processes  and  the  identity  of  the 
player  making  the  move.  Every  such  configuration  is 
associated  with  a  resource  need.  If  this  need  exceeds  the 
availability  of  some  resource  types,  an  "unattainable" 
configuration  and  set  of  states  will  result. 


The  moves  of 
philosophy  of  the 
least  one,  or  any 
grant  at  most  one 


the  players  are  dependent  on  the 
RM.  The  RM  may  detect  at  most  one,  at 
number  of  requests  for  transition,  and 
or  any  number  of  requests  in  one  move. 
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Corresponding  games  can  be  shown  which  are  equivalent  to  the 
deadlock  avoidance  problem.  The  possible  future  evolutions 
of  the  system  can  be  represented  by  the  "graph  of  this 
game",  where  the  nodes  are  the  states  and  the  edges  are  the 
moves.  This  is  a  finite  graph  with  an  infinite  associated 
game  tree.  However,  it  is  relevant  to  note  that: 

*  The  strategy  (deterministic  or  probabilistic)  of  the 

processes  may  not  induce  infinite  games; 

*  The  game  stops  when  a  terminal  node  (no  successors)  is 

reached ; 

*  The  RM  wins  if  the  attained  terminal  node  corresponds  to 

the  process  completion; 

*  The  RM  loses  if  the  states  attained  are  without  attainable 

successors,  i.e.  where  none  of  the  processes  is  in  a 
working  state;  and  some  are  in  transit  states,  but  no 
requests  may  be  granted  without  producing  unattainable 
configurations. 

A  state  is  defined  safe  if  and  only  if,  from  this 
state,  a  strategy  exists  for  the  RM  which  ensures  its 
success  whatever  strategy  the  processes  choose.  A  state 
will  be  unsafe  if  and  only  if  it  is  not  safe.  A  state  will 
be  losing  if  and  only  if,  from  this  state,  a  strategy  exists 
for  the  processes  such  that  the  RM  will  lose  the  game 
whatever  strategy  it  chooses.  Devillers  presents  an 
algorithm  for  the  construction  of  the  set  .of  unsafe  states. 
This  approach  throws  new  light  on  the  deadlock  problem,  and 
the  notion  of  risk  is  explicitly  formalized.  It  also 


* 


' 


. 

- 


58 


provides  a  basis  for  a  systematic  study  of  the  properties  of 
the  safe  states. 

2.9  On  Probabilistic  Models  of  Deadlock 

The  approach  to  the  deadlock  problem  taken  by  Ellis 
[19741  is  different  from  all  previous  approaches  in  the 
sense  that  probabilistic  techniques  are  used  to  investigate 
the  likelihood  of  deadlock  occurrence  in  certain  classes  of 
computer  systems.  Any  system  state  diagram  used  to 
represent  process- resource  interactions  can  also  be  viewed 
as  a  finite  state  automaton,  provided  that  the  number  of 
states  is  finite.  A  probability  measure  can  be  attached  to 
an  occurrence  of  each  possible  transition  out  of  a  given 
state.  The  sum  of  the  probabilities  associated  with 
transitions  out  of  a  given  state  is  required  to  be  unity. 
Ellis  assumes  a  random  resource  allocation  model,  which 
forms  a  Markov  chain.  By  adding  an  auxiliary  storage  to  the 
automaton,  f i rst-come-f i rst-serve  (FCFS)  and  last-come- 
first-serve  (LCFS)  schedulers  can  be  modeled  to  form, 
respectively,  a  probabilistic  queue  automaton  and  a 
probabilistic  push  down  automaton.  The  probability  of 
deadlock  is  measured  in  terms  of  the  expected  value  of  (i) 
the  number  of  system  actions  to  deadlock  or  (ii)  the  number 
of  resource  allocations  to  deadlock.  Calculations  are 
carried  out  for  systems  containing  small  numbers  of 
processes  and  resources.  For  a  system  with  2  resources  and 
2  processes  the  mean  time  to  deadlock  under  FCFS  or  LCFS 
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scheduling  is  shown  to  be  slightly  less  than  that  under 
random  allocation.  The  fact  that  one  would  intuitively 
expect  the  probability  of  deadlock  to  decrease  if  the  number 
of  units  of  the  resources  increases  while  the  number  of 
processes  remain  fixed,  is  substantiated.  Conversely,  in  a 
fixed  resource  system,  increasing  the  number  of  processes 
increases  the  probability  of  deadlock  since  more  processes 
would  be  competing  for  the  same  number  of  resources. 

However,  it  is  not  intuitively  clear  what  happens  to  the 
deadlock  probability  if  the  number  of  processes  and 
resources  are  uniformly  increased.  For  small  systems  Ellis 
has  shown  that  the  probability  of  deadlock  actually 
increases.  However,  since  the  model  considered  no  more  than 
5  resources  and  processes,  which  is  by  no  means  a  very  large 
one  in  the  commercial  world,  more  research  is  clearly  needed 
in  that  area. 

2.10  Current  Implementations  of  Deadlock  Handling  Schemes 

The  internal  structure  of  the  system-wide  shared-file 
table  in  the  Michigan  Terminal  System  (MTS)  is  explained  in 
detail  here.  MTS  is  a  major  operating  system  under 
continuous  development  by  several  major  universities  in 
various  countries.  Before  any  specific  file  operation  is 
performed,  the  files  are  locked  in  MTS  in  one  of  three 
inclusive  levels  (read,  update,  or  destroy).  In  order  to 
ensure  that  the  rules  of  concurrent  usage  are  not  violated 
before  locking,  MTS  maintains  a  table  indicating  at  any 
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instant , 

*  all  the  files  currently  open  and/or  locked, 

*  how  they  are  locked  and  by  what  task, 

*  what  tasks  are  currently  waiting  to  lock  a  file,  and  how 

they  are  waiting. 

This  table  facilitates  determining  whether  or  not  a 
particular  type  of  opening  and/or  locking  of  a  file  can  be 
allowed,  in  accordance  with  the  following  rules  of 
concurrent  usage: 

(i)  If  a  file  is  not  locked  for  updating  or  destroying,  any 
number  of  tasks  can  have  this  file  locked  simultaneously 
for  reading; 

(ii)  If  a  file  is  not  locked  for  reading  or  destroying,  then 
only  one  task  can  have  this  file  locked  for  updating 
(writing,  emptying  or  truncating);  and 

(iii)  If  a  file  is  not  open,  and  not  locked  for  reading  or 
updating,  then  only  one  task  can  have  this  file  locked 
for  destroying  (or  renaming  or  permitting) . 

If  a  file  cannot  be  locked  as  requested,  the  task  is 
queued  to  wait  on  the  file.  Waiting  can  lead  to  deadlock 
due  to  mutual  blocking  of  tasks.  Blocking  is  defined  as 
follows : 

(a)  a  task  waiting  to  destroy  a  file  is  blocked  by  a  task 
with  the  file  open; 

(b)  a  task  waiting  to  update  or  destroy  a  file  is  blocked  by 
a  task  with  the  file  locked  to  read; 

(c)  a  task  waiting  to  read,  update,  or  destroy  a  file  is 
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blocked  by  a  task  with  the  file  locked  for  update;  and 
(d)  a  task  waiting  to  open,  read,  update,  or  destroy  a  file 
is  blocked  by  a  task  with  the  file  locked  to  destroy. 

The  method  that  MTS  employs  to  detect  a  deadlock 
involving  multiple  files  is  to  define  a  relation  BLOCKS, 
where  a  Task  T1  BLOCKS  Task  T2  if  and  only  if  Task  T1  has  a 
file  open  and/or  locked  in  such  a  way  that  Task  T2  is 
blocked  from  using  that  file.  An  M  x  M  matrix,  representing 
the  relation  BLOCKS,  is  constructed.  M  is  the  total  number 
of  tasks,  with  files  open  and/or  locked,  blocking  another 
task  or  being  blocked.  The  transitive  closure  (BLOCKS+)  of 
the  relation  BLOCKS  is  computed  by  using  Warshall's 
algorithm  [Warshall,  19621.  A  deadlock  situation  exists 
when  Task  T  BLOCKS+  Task  T  is  true. 


Data  management  systems,  on  the  other  hand,  vary  in  the 
way  they  detect  and  resolve  a  deadlock.  As  has  been 
indicated  elsewhere,  many  of  the  early  systems  implemented 
techniques  which  were  quite  rudimentary.  In  some  systems, 
such  as  IDMS  (Integrated  Data  base  Management  Systems 
marketed  by  Cullinane  Corp.),  and  TOTAL  (CINCOM,  Inc.) 
deadlock  is  not  possible  because  of  restrictions  on 
processes.  In  these  systems  a  process  can  lock  only  one 
record  at  a  time,  and  so  heavy  restrictions  are  placed  on 
users  to  ensure  that  deadlocks  will  never  occur. 
Consequently,  they  are  neither  general  nor  realistic. 


In  the  data  management  system  ADABAS  (Adaptable  DAta 
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BAse  System,  a  product  of  Software  AG,  West  Germany) ,  if  a 
process  requests  a  locked  record  five  times,  the  record  is 
unlocked.  The  requesting  process  then  gets  hold  of  the 
record  and  proceeds.  Quite  often  a  process,  after  locking 
the  record,  may  attempt  to  update  it,  only  to  find  that  it 
had  lost  control  of  the  record.  The  record  will  have  to  be 
reread  and  the  update  redone.  This  method  leads  to 
unnecessary  preemptions  and  is  also  subject  to  a  peculiar 
situation  in  which  two  processes  may  perpetually  get 
involved  in  a  cycle  of  locking,  losing  control  before 
updating,  and  relocking  the  record  concerned,  as  outlined  in 
Section  2.3  and  Appendix  A. 


' 


CHAPTER  3 


'ON-LINE'  DEADLOCK  DETECTION  IN  DISTRIBUTED  DATABASES 

3.1  Graph-Theoretic  Model  and  Concepts 

Certain  important  concepts  with  direct  relevance  to  our 
algorithms  are  defined  below.  We  choose  a  graph- theoretic 
deadlock  model  [Holt,  1972],  proposed  for  operating  systems, 
and  extend  it  to  represent  process  interactions  in  a 
distributed  database. 

The  set  of  data  resources  (typically  files,  fields, 
records,  or  entities),  represented  by  D  =  fD^ , D^ / . . . , D^} ,  is 
held  by  a  set  of  processes  denoted  by  P  =  {P^ , P^ , . . . , P^} , 
running  concurrently  (intuitively,  processes  initiated  so 
far,  but  not  completed)  in  a  network  of  computers.  A 
directed  graph  with  nodes  corresponding  to  each  process  and 
each  data  resource  in  P  and  D  respectively,  and  with  edges 
between  nodes  representing  process  interactions  in  the 
system,  can  be  used  to  depict  the  system  status.  We 
formalize  this  in. 

Definition  1:  A  system  graph  G„  =  (N,E)  is  a  directed 
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bipartite  graph1  whose  disjoint  sets  of  nodes  are  those 

corresponding  to  and  D,  respectively  called  process  nodes 

and  data  resource  nodes,  such  that  N  =  P  U  D.  An  edge 

directed  from  a  data  resource  node  D.  to  a  process  node  P.. 

i  J 

denoted  (D^Pj)  ,  is  called  a  re  source- process  access  (RPA) 

edge  which  specifies  that  the  data  resource  is  held  by 

process  Pj.  Similarly,  a  directed  edge  from  a  process  node 

p  to  a  data  resource  node  D  ,  denoted  (P  ,D  ) ,  is  called  a 
r  s  r  s 

process- resource  wait (PRW)  edge  which  indicates  that  process 

p  is  waiting  for  access  to  data  resource  D  .  E  is  the 
If  s 

union  of  the  sets  of  RPA  edges  and  PRW  edges.  The  type  of 
access  granted  to  an  RPA  edge  or  requested  by  a.  PRW  edge  is 
indicated  by  the  letter  "e"  or  "s"  for  exclusive  or  shared 
access . 


Remarks ;  A  process  cannot  hold  a  data  resource  and  be 
simultaneously  waiting  for  access  to  it  (i.e.,  cannot  be 
self-blocking).  It  is,  thus,  necessary  for  the  process  to 
declare  its  most  restrictive  use  of  a  data  resource,  in 
order  to  prevent  the  process  from  getting  blocked  waiting 
for  exclusive  access  when  it  already  has  shared  access. 

This  implies  that  if  (P^D.)  is  an  edge  in  the  graph,  then 
there  does  not  exist  an  edge  (Dj,P^)  and  vice  versa . 

Definition  The  r eachabl e  set  of  a  node  of  the  system 

1:  For  the  definitions  and  understanding  of  graph- theoretic 
terms  refer:  Deo,  N. ,  Graph  Theory  wi th  Appl ications  to 
Eng ineer ing  and  Computer  Science ,  Prentice  Hall,  Inc., 
Englewood  Cliffs,  N. J. ,  1974. 
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graph  G  ,  denoted  by  R (N . ) ,  is  the  set  of  all  nodes  in  G 
s  j  s 

such  that  there  exists  a  path  directed  from  to  all  nodes 
in  R  (N  j  )  . 


The  notion  of  reachable  set  first  introduced  by  Holt  is 

used  effectively  in  deadlock  detection.  As  soon  as  the 

information  on  the  addition  or  deletion  of  an  edge  to  G  is 

s 

received,  the  reachable  sets  can  be  updated  to  detect 
deadlocks  on-line,  as  proposed  in  this  chapter,  which  do  not 
need  the  transmission  of  large  tables  between  instal la tions . 


A  deadlock  situation  may  arise  when  the  following 
necessary  conditions  hold: 

(a)  the  processes  request  exclusive  control  of  the  data 
resources  (for  updating); 

(b)  the  processes  hold  data  resources  allocated  to  them,  and 
wait  for  additional  ones  to  run  to  completion;  and 

(c)  the  preemption  of  data  resources  from  processes  cannot 
be  done  without  aborting  the  processes. 

We  formally  characterize  this  in. 

Definition  3:  A  state  of  deadlock  i n  a  "circular  wait" 
condition  is  said  to  exist  when,  in  a  circular  chain  of 
processes,  each  process  holds  one  or  more  data  resources  and 
has  requested  access  to  at  least  one  data  resource  held  by 
the  next  process  in  the  chain. 


3.2  A  Running  Example 


Tn  Figure  3.1,  fp0'Pi'P2^'  ^PVP4^'  ^P5'Pfi^'  ^P7'P8^ 
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Figure  A  system  graph  for  a  network  configuration  with 

four  computers,  C^r  ,  and  C  ,  a  set  of  concurrent 

processes  fP. and  a  set  of  data  resources  fD. 

1  j  0<j<10 


Key : 


PRW  Edges, 
RPA  Edges. 
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are  subsets  of  processes  that  exist  in  computers  ,  C^,  C 
and  C4  respectively.  Data  resources  {D^D^  ,D?}  ,  {D^D^D^}, 
{D^,D7},  {D8,DgfD10}  reside  at  C^,  C^,  C respectively. 

The  RPA  edges  are  shown  by  solid  lines,  the  PRW  edges  by 
dashed  lines.  The  closely  dotted  and  the  sparsely  dotted 
lines  represent  RPA  and  PRW  edges  that  have  not  yet  been 
introduced  into  the  system.  These  future  requests  are 
included  to  illustrate  the  various  aspects  involved  in  on¬ 
line  deadlock  detection.  At  computer  C?  of  the  network  P^ 

holds  D-,  ,  D  and  D  _  ,  and  is  active.  P  ,  holds  Dr  and  is 
3  4  6  4  5 

waiting  for  access  to  D4  and  D  ,  and  hence  is  blocked.  P  , 
p.,  P_,  Pr  and  P_  are  blocked.  P  ,  P_  and  P_  are  active. 

The  reachable  sets  of  nodes  D^ ,  p4,  and  P^,  for  instance, 
are  {P^},  {D4  ,  p3  ,  D^ ,  p2  }  and  {())}  respectively.  If  we  assume 
that  P^.  holds  D?  for  shared  access,  then  any  request  by  P 
for  shared  access  to  D7  (which  results  in  the  introduction 
of  the  RPA  edge  from  D?  to  P  )  will  cause  P  and  p7  to  be 
deadlocked . 


3.3  Necessary  and  Sufficient  Conditions  for  Deadlocks 


We  have  defined  the  concepts  of  system  graph  and 
reachable  sets  in  Section  3.1;  these  are  used  in  this 
section  to  develop  necessary  and  sufficient  conditions  for 
the  existence  of  deadlocks. 


Let  Pj  be  a  process.  If  P^  belongs  to  its  own 

reachable  set,  then  there  exists  a  path  in  Gg  which  starts 

at  P.  and  ultimately  ends  in  P..  Since  G  is  a  bipartite 
3  Is 
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graph,  the  edges  in  this  path  should  be  alternately  selected 
from  the  two  sets — PRW  and  RPA  edges.  This  in  turn  means 
that  process  Pj  is  waiting  for  a  data  resource,  which  is 
held  by  another  process,  and  this  process  in  turn  is  waiting 
for  a  data  resource  held  by  the  next  one  in  line,  and  so  on, 
till  some  process  in  the  chain  is  waiting  for  the  data 
resource  held  by  P  ^  .  Intuitively,  each  process  in  the 
circular  chain  holds  the  data  resource  for  which  the 
previous  process  in  the  circular  chain  has  a  waiting-access 
request,  and  is  by  itself  waiting  for  the  data  resource  held 
by  the  next  process  in  the  chain,  leading  to  a  circular  wait 
condition.  Thus  if  a  process  node  Pj  belongs  to  its  own 
reachable  set,  then  process  P^  is  deadlocked.  And 
conversely,  if  a  process  Pj  is  deadlocked,  the  reachable  set 
of  the  process  node  Pj  contains  the  process  node  P^.  This 
provides  us  with  a  necessary  and  sufficient  condition  for 
the  existence  of  deadlock.  We  register  this  fact  in. 


Theorem  1:  A  process  P^  is  deadlocked  in  a  circular  wait 
condition  if  and  only  if  the  reachable  set  of  the 
corresponding  process  node  in  G  contains  the  node  itself. 
That  is,  process  Pj  is  deadlocked  in  a  circular  wait 
condition  if  and  only  if  process  node  Pj  G  R(P^). 

Proof : 

(<==)  p_.  g  R(Pj)  implies  the  existence  of  a  path  P  (say)  in 

G  ,  starting  at  P.  and  terminating  in  P..  Since  G  is  a 
s  D  is 


bipartite  graph,  the  edges  in 


P 


alternate  between  the 


two  sets  of  edges:  PRW  and  RPA  edges.  Without  loss  of 
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generality,  let  f  be  made  of  the  edges  (Pj,Dj^), 

(Dj1/Pj1)»  (Pj1fDj2) (pjk_lrD:k)  and  (Djk,P^), 

where  D j  ^  for  1_< i <^k  are  data  resources  and  Pj^  for 

l£i<k-l  are  processes.  Path  P  indicates  that  Pj  is 

waiting  for  access  to  Dj^  held  by  process  Pj  ,  each  of 

the  processes  P j  ^ ^  is  waiting  for  Dj ^  held  by  the 

process  Pj .  for  all  i,  2< i£k-l ,  and  Pj  is  waiting  for 

D  j ,  held  by  P..  In  other  words,  each  process  is  waiting 
K  1 

for  the  next  process  to  run  to  completion  in  the 

circular  chain  of  waiting  processes. 

(==>)  Let  us  assume  that  Pj  ^  R(Pj),  implying  that  there 

does  not  exist  a  path  starting  at  Pj  and  terminating  in 

P j .  Thus,  there  must  exist  a  linear  ordering  of  the 

processes  with  the  following  property:  P.  precedes  P,  if 

there  is  a  path  from  P.  to  P,  in  G  .  Without  loss  of 

generality,  let  such  a  linear  ordering  starting  with  Pj 

be  p.,  Pj. ,  Pj_,...,  Pj  ,  implying  that  P.  is  waiting 
3  l  x  s  3 

for  access  to  a  data  resource  held  by  Pj^  and  Pj^_^  is 
waiting  for  access  to  data  resource  held  by  Pj^  for  all 
i,  2<i<s.  Thus,  Pj  can  run  to  completion.  Pj  .  can 
therefore,  run  to  completion  as  soon  as  Pj^  terminates 
for  all  i=s , s-1 , . . . , 3 , 2 .  Ultimately,  Pj  can  run  to 
completion  when  Pjx  terminates,  which  is  a 
contradiction.  fl 


If  P^  and  Pj  are  any  two  processes  such  that  their 

reachable  sets  in  G  include  both  P.  and  P.,  then  the 

s  il 

reachable  sets  of  both  should  be  same.  Intuitively,  if 
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there  is  a  path  from  to  p  ,  and  vice  versa ,  then  all  the 
nodes  that  are  accessible  from  P..  are  also  accessible  from 
Pjf  and  vice  versa.  In  Corollary  1  we  arrive  at  a  necessary 
and  sufficient  condition  to  identify  all  processes  involved 
in  a  deadlock. 


Corollary  1_:  A  set  of  processes  {Pj^}^<.^<,^  is  deadlocked  in 
a  circular  wait  condition  if  and  only  if  P j  ^  6  RfPj^)  for 
all  i,  lj<i<_k,  and  R(Pj^)  =  R(Pj  )  =...=  R(Pj  ). 

Proof ; 

(<=  =  )  Since  Pj^  G  R(Pj^)  for  all  i,  l<_i<k,  and  the  reachable 
sets  of  all  processes  are  the  same,  by  Theorem  1  the  set 
of  processes  is  deadlocked. 

(==>)  Since  each  process  is  deadlocked,  we  have  by  Theorem 
1,  pj^  G  R(Pj^)  for  all  i,  l£i_<k.  Due  to  the  circular 
wait  condition,  each  process  node  in  is 

accessible  from  the  others. 

That  is,  R(Pj.)  ^  I  I  R(pj  )  for  all  i,  l_<i<k. 

1  l<^<k  S 

s^T 

Hence,  R(Pj.)  =  R(Pj0)  =...=  R(Pj.).  9 


Remarks ;  Theorem  1  and  Corollary  1  have  similarities  with 
the  necessary  and  sufficient  conditions  for  deadlock  derived 
by  Holtri9711  for  "reusable  resource  graph  with  single  unit 
resources".  The  membership  of  a  process  node  in  its  own 
reachable  set  is  equivalent  to  the  existence  of  a  cycle  in 
Holt's  reusable  resource  graph. 


Corollary  1  points  out  the  simplicity  of  recognizing 
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the  set  of  processes  involved  in  a  deadlock.  In  Corollary 
2,  we  show  a  sufficient  condition  for  a  process  not 
deadlocked,  to  be  blocked  indefinitely  by  virtue  of  its 
waiting  for  a  deadlocked  process.  Consequently,  such  a 
process  is  blocked  forever. 


Corollary  2:  A  process  such  that  ^  R(P^)  in  G^,  is 
blocked  forever  if  R(Pj)  ^R(P^)  f°r  anY  Process  P^  such 


tha tP.  6  R (P . )  in  G  . 

1  1  s 

Proof:  P.  €  R(P.)  and  R(P.) 

-  l  l  j 


R(P^)  implies  that  P_.  is 


either  waiting  for  P^  to  run  to  completion,  or  there  is  a 
sequence  of  processes  from  P^  to  P^  each  one  of  which  is 
waiting  for  the  next  one  in  the  sequence  to  run  to 
completion.  In  either  case  Pj  cannot  run  to  completion 
since  P^  is  deadlocked  in  a  circular  wait  condition.  B 


For  instance,  in  the  system  graph  of  Figure  3.1,  the 

introduction  of  a  PRW  edge  from  P^  to  D<_,  consequent  to  P^ 

requesting  access  to  D^,  causes  P^  and  P^  to  be  deadlocked. 

Pj  is  blocked  forever  but  not  deadlocked,  since  it  waits  for 

p_  to  release  D_,  and  D,.  The  reachable  set  of  P,  is 
3  3  4  1 

{D? • r D5 f Pj r P3 r P4 J  which  contains  the  reachable  set 

{D0 , D  , D  , P  , P  , P  }  of  the  deadlocked  processes  P  and  P  . 

2  4  5  2  3  4  3  4 

3.3.1  Characteristics  of  Waiting  Processes 


In  on-line  algorithms  for  deadlock  detection,  the  basic 
operation  is  to  update  the  reachable  sets  every  time  a  new 
edge  is  introduced  in  G  or  deleted  from  G  .  The 

o  o 

introduction  of  a  new  edge,  if  it  causes  deadlock,  implies 


f 

'  . 


72 


that  the  entering  edge  prompted  a  chain  of  processes  to  wait 
for  each  other.  A  process  waits  for  other  processes  if  the 
reachable  set  of  the  corresponding  process  node  is  non-null. 
Conversely,  a  process  node  with  a  non-null  reachable  set  has 
the  corresponding  process  waiting  for  at  least  one  other 
process.  We  characterize  this  property  of  waiting  processes 
in  Lemma  1. 


Lemma  :  A  process  waits  for  a  process  if  and  only  if 

the  process  node  P.  G  R(P.)  in  G  . 
r  31s 

Proof : 

(<==)  since  P^  G  R(P^)  ,  there  exists  a  path  from  P^  to  P_. . 
Let  such  a  path  (without  loss  of  generality)  be 


(P, ,D.  ),  (D.  ,P.  ),  .  ..,  (P.  , D •  ),  (D.  ,P.).  P.  is 

1  11  L1  11  xk-l  xk  xk  3  1 

waiting  for  the  data  resource  D.  held  by  P.  .  Process 

1  \l 

P.  is  waiting  for  data  resource  D.  held  by  process 

1 s— 1  1s 

P.  for  all  s,  2<s<k-l.  P.  is  waiting  for  data 
xs  1 k-1 

resource  D.  held  by  P..  Thus,  P.  is  waiting  for  P.. 

1  k  ^  1  ^ 

(==>)  waits  for  P_.  implies  that  either  there  is  a  data 

resource  which  is  held  by  P^  for  which  P.  has  an  access 

request  or  there  is  a  sequence  of  processes  from  P^  to 

Pj  such  that  each  process  at  least  holds  one  data 

resource  and  is  waiting  for  access  to  the  data  resource 

held  by  the  next  process  in  sequence.  This  in  turn 

means  that  there  exists  a  path  from  P^  to  P^.  B 


Remarks :  A  process  Pi  waits  for  no  data  resource  if 
R(Pi)  =  {())}  ,  and  thus,  can  run  to  completion.  At  any  given 
time,  a  process  is  active  and  not  blocked  if  and  only  if  the 


. 
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reachable  set  of  the  corresponding  process  node  is  null. 
Any  process  with  a  non-null  reachable  set  is  blocked, 
waiting  for  at  least  one  process. 


3.4  Deadlock  Detection  in  Distributed  DBMS 

For  the  algorithms  proposed  in  this  chapter,  all  the 

necessary  information  to  detect  deadlocks  is  made  available 

through  a  system  graph  at  each  installation.  Maintaining 

the  system  graph  is  trivial,  and  requires  communication  only 

for  processes  and  resources  which  are  global  in  nature.  For 

processes  which  are  local,  accessing  only  those  local 

resources  which  have  no  global  interactions  of  any  kind, 

communication  is  not  necessary.  It  is  believed  that  for 

transaction  processing  systems  over  95-99%  of  processes  fall 

2 

into  this  category  .  However,  to  maintain  a  system  graph 
for  global  processes,  or  for  those  that  interact  with  one, 
communication  is  required.  This  communication  provides 
little  impact  on  the  network,  unlike  the  earlier  schemes 
which  make  very  heavy  demands  in  order  to  determine  the  true 
network  status.  Processes  which  are  local  at  a  particular 


2 :  See 

Bernstein,  P.A.,  et  al . ,  "The  Concurrency  Control  Mechanism 
of  SDD-1 :  A  Sy¥tem  for  Distributed  Databases  (The  Fully 
Redundant  Case)",  IEEE  Trans,  on  Software  Engg.,  SE-4 
(3)  ,  May  19*78,  pp .  154-158. 

Stonebraker,  M.R.,  "Concurrency  Control  and  Consistency  of 
Multiple  Copies  of  Data  in  Distributed  INGRES", 
Memorandum  No.  UCB/ERL  M78/24,  Electronics  Research 
Lab.,  Univ.  of  California,  Berkeley,  Calif.,  U.S.A.,  May 
1978. 
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instant  could  become  global  at  a  later  time  necessitating 
the  transmission  of  a  collection  of  accumulated  resource 
allocations  to  the  various  installations.  Transitions  of 
local  processes  to  global  status  leads  to  a  small 
incremental  change  in  the  size  of  the  global  system  graph. 
The  ensuing  communication  is  still  modest.  Communication 
activity  in  our  approach  is  modest  in  the  sense  that  it  is 
incremental  and  is  di spe r sed  over  a  period  of  time.  Other 
approaches  rely  on  simultaneous  exchange  of  status  for  each 
site  in  the  network.  The  transmission  delays  due  to  huge 
message  traffic  can  lead  to  synchronization  problems,  which 
our  more  responsive  method  minimizes.  Our  approach  also 
avoids  the  message  congestion  due  to  simultaneous  transfer 
of  large  tables  from  every  installation.  The  advantage  of 
this  method  lies  in  its  utility  for  on-line  deadlock 
detection  in  distributed  DBMS,  which  in  turn  means  the 
corrective  action  can  be  taken  earlier. 

Given  the  system  graph  G  ,  the  two  significant  steps  in 
the  detection  of  deadlocks  are: 

(i)  to  determine  the  reachable  sets  of  all  the  nodes  in  the 
system  graph  G  ,  and 

9 

(ii)  to  find  out  if  the  necessary  and  sufficient  conditions 
for  the  existence  of  a  deadlock  is  fulfilled,  by 
utilizing  the  reachable  sets. 

ALGORITHM  B 

Input :  System  graph  (Gg)  . 

Output:  Processes  involved  in  a  deadlock(if  detected). 
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Bl:  [Determine  reachable  sets  of  all  nodes  in  G  1  For  every 

s 

node  Nj  G  N,  determine  the  reachable  set  R(N_.).  That 
is,  R(Nj)  is  the  set  of  all  nodes  in  N  such  that  there 
exists  a  directed  path  from  N ^  to  these  nodes. 


B2:  TDetect  deadlockl  If  Nj  €  N  is  a  process  node  such  that 

N.  G  R  (N . )  ,  then  N.  is  deadlocked.  For  every  subset  of 
3  3  3  y 

process  nodes,  {Mj  such  that  N  j  .  G  R(Nj^)  for  all 

i,  l_<i£k,  and  R(Nj^)  =  R(Nj  )  =...=  R(Nj  ),  the 
corresponding  subset  of  processes  is  deadlocked. 


Remarks ;  The  proof  of  correctness  of  the  algorithm  is 
straight  forward,  and  follows  directly  from  Definition  2, 
Theorem  1,  and  Corollary  1. 

We  illustrate  our  approach  by  an  example.  Assuming 
that  PQ  requests  access  to  D  in  the  configuration  of  Figure 

o  o 

3.1,  a  PRW  edge  (Pg,Dp)  is  introduced.  The  inclusion  of 
this  edge  causes  Pg,  P,.,  P7  and  P^  to  be  deadlocked.  To 
detect  such  a  deadlock,  our  mechanism  determines  reachable 

sets  of  all  nodes  in  G  .  In  the  example  considered,  the 

s 

reachable  sets  of  these  processes  are  the  same  and  equal 

{D0 , Pc , D_ , P_ ,D_ , P - ,D. _ , PQ } .  The  existence  of  deadlock  is 
oby  /  f  h  jlV)  o 

detected  by  noting  that  each  process  in  {p^ , P^ , P7 , Pg } 
belongs  to  its  own  reachable  set. 


3.5  'On-Line'  Detection 

When  dealing  with  concurrent  database  accesses,  little 
is  known  about  the  probability  of  interference  or  deadlock. 


■ 
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For  transaction  processing  systems,  Peebles  and  Manning 
[19781  firmly  believe  that  interference  is  rare,  and  that 
elaborate  avoidance  o r  prevention  mechan i sms  would  not  be 
economical .  We  agree  and  advocate  the  use  of  deadlock 
detection  in  distributed  systems.  Further,  to  quote  Le  Lann 
[19781,  "our  conclusion  will  be  that  for  systems  which 
include  a  partitioned  database  and  which  provide  for  storage 
of  pending  requests,  maintenance  of  internal  integrity  boils 
down  to  a  problem  of  deadlock  avoidance  or  detection  with 
distributed  control".  As  a  consequence,  and  in  view  of  the 
present  day  trend  towards  increased  concurrent  access  in 
systems,  we  recommend  the  use  of  on-line  deadlock  detection 
in  distributed  systems.  It  is  our  belief  that  such  a  method 
contributes  substantially  to  increasing  concurrency. 

In  our  view,  deadlock  prevention  schemes  are  not 
justifiable  for  use  in  distributed  systems.  Processes  that 
are  not  known  to  be  nonconflicting  need  extensive 
coordination  and,  in  general,  substantial  communication 
among  installations  is  necessary  before  process  initiation. 
This  affects  system  performance  by  lowering  the  degree  of 
concurrency.  The  past  use  of  prevention  principles  was 
acceptable  because  of  low  levels  of  concurrency  in  systems 
rather  than  any  inherent  superiority. 

In  on-line  detection,  every  installation  can  determine 
whether  or  not  allocating  one  of  its  data  resources  to  a 
process  residing  on  another  computer  will  lead  to  deadlock. 
This  is  facilitated  by  the  ready  availability  at  each 
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installation  of  the  system  graph  and  the  reachable  sets, 
which  are  continually  updated  as  edges  are  added  or  deleted. 
The  data  resource  allocation  decision  is  transmitted  by  the 
access  controller  at  the  installation  concerned  to  all 
others  in  the  network.  Thus,  maintaining  and  updating  the 
system  graph  for  global  interactions,  at  each  installation, 
requires  a  low  level  of  continual  communication. 

The  on-line  detection  of  deadlocks  need  be  considered 
only  for  the  following  complete  set  of  process- resource 
interactions : 

a)  a  new  process  enters  the  system; 

b)  a  new  data  resource  is  accessed; 

c)  a  process  runs  to  completion  and  releases  data  resources 

held'' ; 

d)  a  process  in  the  system  requests  access  to  a  data 

resource  held  by  another  process;  and 

e)  a  data  resource  held  by  a  process  is  preempted  from  it. 


7 :  In  order  to  retain  the  "strong  consistency"  result  of 

Eswaran  e_t  al .  [197^1,  which  requires  that  the  processes 
be  "well^npormed"  and  "two-phase",  a  process  is  required 
to  be  divided  into  growing  and  shrinking  phases.  The 
first  unlock  action  marks  the  beginning  of  the  shrinking 
phase,  after  which  a  process  cannot  issue  a  lock  request 
on  any  entity  in  the  database  until  the  release  of  all 
entities  held  by  the  process. 

The  actual  implementation  of  a  two-phase  protocol  (as  in 
SYSTEM  R)  is  to  release  all  data  resources  held,  at  the 
completion  of  the  process.  (Private  Communication  from 
J.N.  Gray,  IBM  Research  Laboratory,  San  Jose,  Calif., 
U.S.A.,  February  1979.) 
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3.6  Resolution  of  Process-Resource  Interactions 

(a)  A  new  process  enters  the  system  and/or 

(b)  A  new  data  resource  is  accessed :  New  process  and/or  data 

resource  entry  into  the  system  introduces  the  respective 

nodes  into  G  .  An  RPA  edge  is  added  to  G  whenever  a  new 
s  s 

data  resource  is  accessed  either  by  an  entering  process  or 
by  one  in  the  system.  The  request  by  an  entering  process 
for  a  data  resource  in  the  system  may  not  be  granted,  thus 
introducing  a  PRW  edge.  In  either  case,  a  deadlock-free 
system  continues  to  be  so. 

Assertion  1_:  If  the  system  in  a  network  configuration  is 
dead lock-f r ee ,  a  new  process  entry  into  the  system  does  not 
lead  to  deadlock  in  a  circular  wait  condition.  fl 

Assertion  2_\  If  the  system  in  a  network  configuration  is 
deadlock-free,  accessing  a  new  data  resource  does  not  lead 
to  deadlock  in  a  circular  wait  condition.  ■ 

(c)  A  process  in  the  system  runs  to  completion  and 
releases  all  data  resources  held ;  The  process  node,  and  all 
released  data  resource  nodes  which  have  no  waiting-access 
requests,  are  deleted  from  Gs .  Released  data  resources  with 
a  single  waiting-access  request  are  allocated  to  the 
corresponding  processes.  However,  an  allocation  decision 
for  a  released  data  resource  with  multiple  wa i ting-access 
requests  is  done  in  the  manner  indicated  in  Example  3.1  and 
according  to  the  condition  to  be  derived  in  Lemma  2,  to 
avoid  a  potential  deadlock. 
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Assertion  3^:  If  the  system  in  a  network  configuration  is 
deadlock-free,  then  neither  releasing  the  data  resources 
held  by  a  completed  process  for  which  there  are  no  requests, 
nor  allocating  the  released  data  resources  for  which  there 
is  a  single  waiting-access  request  leads  to  deadlock  in  a 
circular  wait  condition.  B 

Exampl e  For  the  configuration  of  Figure  3.1,  let 

us  assume  that  requests  access  to  held  by  P^  ,  thus 

introducing  the  PRW  edge  (P^,D^)  in  .  Let  P^,  which  holds 
D-^ ,  and  ,  run  to  completion.  D^.  has  no  requests  and 
hence  is  deleted  from  the  system  graph.  is  being  waited 

for  by  P,  and  is  allocated  to  P,  ,  whereas  D„  has  two 

1  1  1  4 

waiting-access  requests  from  P-^  and  P^.  Assume  that  P^ 
issued  its  request  before  P^  did.  If  the  allocation  of 
to  P^  is  done  in  a  FIFO  manner,  then  processes  P^,  P^,  and 
p^  will  be  deadlocked.  It  is  obviously  more  advantageous  to 
make  the  allocation  of  to  P^  and  let  p  proceed,  than  to 

make  the  allocation  of  to  P^  and  be  deadlocked.  This  is 

crucial,  especially  when  rollback  and  recovery  in  a  network 
environment  is  expensive.  Therefore,  in  Lemma  2,  we  give  a 
necessary  and  sufficient  condition  to  recognize  such  a 
situation  and  to  avoid  deadlock  accordingly.  Corollary  3  to 
Lemma  2  states  that  in  the  case  of  a  deadlock-free  system 
with  multiple  processes  waiting  for  access  to  a  released 
data  resource,  there  exists  at  least  one  process  such  that 
an  allocation  made  to  this  process  maintains  the  system 
dead lock-f ree .  In  the  case  of  multiple  processes  waiting  in 


‘ 
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a  FIFO  manner  for  access  to  a  released  data  resource,  the 
allocation  is  made  to  the  first  process  which  maintains  the 
system  deadlock-f r ee .  Further  improvement  may  be  possible 
by  allocating  the  resource  to  the  first  process  which  not 
only  maintains  the  system  deadlock-f ree ,  but  also  has 
minimum  waiting-access  requests  on  other  data  resources. 

Lemma  2:  Let  P^  and  P_.  be  any  two  processes  in  a  deadlock- 

free  system  with  waiting-access  requests  to  the  data 

resource  D,  .  The  allocation  of  the  data  resource  D,  to  P. 

k  k  1 

causes  deadlock  in  a  circular  wait  condition  if  and  only  if 
p_.  G  R(P.)  before  the  allocation 
Prop  f : 

(<=  =  )  (p.,D  )  and  (P.,D  )  are  PRW  edges  in  G  and  since 

1  K  ^  K  S 

Pj  G  R(P^),  it  follows  that  P^  G  R  (P  ^ )  )  R  (P.)  ^  (Dj.) 

and  there  exists  a  path  from  P^  to  P^  without  using  the 

edge  (p.,D,  ).  Hence,  the  deletion  of  the  edge  (P.,D,  ) 

and  the  introduction  of  the  edge  Gs  to 

allocate  D,  to  P.,  leads  to  a  path  from  P.  to  P .  through 
K  1  1 

the  edges  (Pj'Dk)r  and  (D^P^,  Therefore,  P.  G  R(P^) 

==>  R  (P  j  )  3R(P.).  But  R(P.)  Z)R(Pj)  =  =  >  R  (P  t )  =  R  (P  j  )  • 

Thus,  P^Pj  6  R(P.)  =  R(P.). 

(==>)  After  D,  is  allocated  to  P.  the  processes  are 

K  1 

deadlocked,  implying  that  P^,Pj  e  R(P^)  =  R(Pj).  Now, 

we  claim  that  P-  G  R(P.)  before  the  allocation  of  D  to 

1  i  K 

p ^ .  Let  us  assume  the  contrary.  That  is,  if  P^  ^  R(P^) 

before  the  allocation  of  Dk  to  then  the  allocation 

of  D,  to  P.  should  have  added  P.  to  R(P.)  which  is  a 
k  l  31 
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contradiction,  implying  that  6  R(P^)  before  the 
allocation.  B 

Corollary  3_:  Let  ^Pi^i<i<n  be  tbe  Processes  in  a  deadlock- 

free  system  with  waiting-access  requests  to  the  data 

resource  .  There  exists  at  least  one  process  (l<s<n) 

such  that  the  reachable  set  of  P  does  not  contain  any 

s 

process  P^  for  all  i=l,2,...,n. 

Proof ;  Assume  the  contrary.  That  is,  there  does  not  exist  a 

process  Pg  such  that  P.  |  R(Ps)  f°r  i  =  i  /  2 ,  .  .  .  ,  n .  This 

in  turn  implies  that  for  any  process  P  (l_<s<_n)  there  exists 

at  least  one  process  P-  such  that  P.  G  R(P  ) ,  where 

1  1^ 

j  G  { 1 , 2 ,  .  .  . ,  n  }  . 

==>  R(PS)  D  R(Pj)  • 

Intuitively,  there  exists  no  process  in  the  set 

whose  reachable  set  does  not  contain  that  of  another  process 

in  the  set.  Since  there  is  a  finite  number  of  processes, 

P  G  R(P  .)  23  R(p  )•  ==>  p-'p  e  R  (P  . )  =  R(P  ). 

s  3  s  s  s 

By  Theorem  1,  this  is  a  deadlock  situation,  which  is  a 
contradiction.  B 

Remarks ;  The  processes  in  the  set  S  =  {p.}^  <,..<,  waiting  for 
a  resource  D,  in  a  deadlock-free  system  can  be  topologically 

K 

sorted4.  The  allocation  of  to  any  of  the  processes  which 
do  not  precede  any  other  process  in  such  a  topological  sort 


4:  For  topological  sorting  refer:  Knuth,  D.E.,  Fundamental 
Algor  i  thms ,  The  Art  o  f  Computer  Programming ,  1_,  Addison 
Wesley  Publ i sh i ng  Co . ,  Reading,  Mass.,  1972,  pp .  258- 
255. 
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of  the  set  S,  retains  the  system  deadlock-f ree .  Typically, 
these  processes  to  which  the  allocation  can  be  made,  have 
minimum  cardinality  reachable  sets.  However,  it  is  possible 
to  find  a  process  in  S  which  does  not  precede  any  other 
process  in  the  topological  sort  of  S,  but  which  has  non- 
minimal  cardinality  reachable  set  by  virtue  of  its  waiting 
for  processes  other  than  those  in  the  set  S  (since  we  allow 
more  than  one  outstanding  request  for  a  process). 

The  situation  in  Example  3.1  arises  because  process 
has  waiting-access  requests  for  both  D2  and  D^.  In  this 
case  our  scheme  detects  a  potential  deadlock  and  avoids  it 
accordingly,  by  virtue  of  Lemma  2.  The  potential  for  a 
query  to  be  waiting  for  access  to  two  data  resources  is 
illustrated  in  the  Appendix  B.  It  is  unrealistic  to 
restrict  a  process  to  have  only  one  outstanding  request,  yet 
this  has  been  the  case  in  approaches  by  earlier  authors  - 
including  the  one  by  Goldman.  Thus,  our  approach  combines 
detection  and  avoidance  principles  in  deadlock  handling,  and 
deals  with  multiple  waiting  requests  in  a  realistic  way. 

Although,  in  operating  systems,  a  process  was  allowed 
to  have  at  a  time  more  than  one  outstanding  request; 
however,  if  the  system  was  unable  to  satisfy  all  the 
outstanding  requests  at  once ,  the  process  was  required  to 
drop  them  [Holt,  19721.  This  in  turn  ruled  out  the 
occurrence  of  a  situation  analogous  to  case  (c)  developed  in 
our  approach.  Consequently,  researchers  of  the  deadlock 
problem  in  databases  did  not  allow  more  than  one  outstanding 
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request,  thus  avoiding  the  situation  in  which  an  allocation 
of  a  released  resource  (following  the  completion  of  a 
process)  lead  to  a  deadlock.  On  account  of  the  fact  that 
the  requests  in  databases  are  data-driven  and  content-based 
the  possibility  of  multiple  outstanding  requests  is  high. 
Thus,  the  results  and  the  method  suggested  in  this  chapter 
to  handle  case  (c)  are  new  and  original.  Also,  the 
combination  of  both  the  detection  and  avoidance  principles 
to  provide  a  mixed  solution  is  the  first  of  its  kind  in 
database  systems. 

(d)  A  process  in  the  system  requests  access  to  a  data 

resource  held  by  another  process :  Since  the  request  cannot 

be  granted,  the  introduction  of  the  PRW  edge  can  lead  to  a 

cycle  in  G  ,  meaning  a  deadlock.  In  this  case,  the 
s 

reachable  sets  are  updated  appropriately,  and  a  test  for 
deadlock  is  carried  out. 

(e)  Preemption  o f  data  resources  held  by  a  process : 
Preempting  data  resources  is  done  when  a  process  is  aborted 
in  an  attempt  to  break  a  system  deadlock.  In  such  a  case, 
the  waiting-access  requests  of  the  aborted  process  are 
dropped,  and  the  data  resources  held  by  the  process  are 
released . 

Typical  criteria  for  the  selection  of  process(es)  to  be 
aborted  are  outlined  below.  However,  the  algorithms  for 
selecting  the  process  are  non-trivial,  and  are  not  dealt 


with  here. 
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(i)  Abort  a  process  which  holds  minimum  number  of  data 
resources  for  exclusive  access  (preferably  none),  since 
this  can  result  in  reduced  rollback  costs. 

(ii)  Of  all  the  deadlocked  processes,  abort  the  one  which 
has  used  minimum  CPU  time. 

(iii)  Abort  a  process  which  involves  rollback  at  a  single 
installation,  in  preference  to  termination  of  one  that 
leads  to  global  rollback  and  consequent  communication 
overhead . 

( iv)  Abort  a  process  which  has  modified  as  few  data 
resources  as  possible,  and  has  interacted  with  other 
processes  as  little  as  possible,  to  minimize  the  cost  of 
rollback . 

Also,  different  analytic  models,  strategies,  and  approaches 
to  rollback  and  recovery  appear  in  rchandy  and  Ramamoorthy, 
1972;  Chandy  e_t  al^.  ,  1975;  Maryanski  and  Fisher,  1977]. 

3.7  "On-line"  Deadlock  Detection  Algorithms 

Bayer [1975 , 1976 1  has  presented  and  analyzed  an  on-line 
transitive  closure  algorithm  for  deadlock  discovery  in 
databases.  To  our  knowledge,  on-line  deadlock  detection 
algorithms  for  distributed  DBMS  have  not  yet  been  proposed. 
In  this  section,  we  present  procedures  for  updating 
reachable  sets  as  interactions  go  on  in  the  system. 

Further,  these  procedures  are  used  in  developing  an  on-line 
deadlock  detection  technique. 


The  step  of  prime  importance  in  the  on-line  detection 


I 
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' 
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of  deadlocks  is  updating  the  reachable  sets  of  the  nodes  in 

G  .  Maintaining  G  is  fairly  trivial,  but  calculating 
s  s 

reachable  sets  of  all  nodes  starting  from  scratch,  every 
time  an  edge  is  added  or  deleted  can  be  comparatively 
expensive.  It  is  sufficient  to  design  a  good  on-line 
algorithm  for  updating  reachable  sets,  since  the  reachable 
sets  need  to  be  modified  partly  as  edges  are  added  or 
deleted  from  Gg. 


Let  N  be  the  set  of  nodes  of  G  .  R  (N . )  is  the 

s  1 

reachable  set  of  node  Q  N  for  all  i,  l_<i<jN|.  Algorithm 

A,  presented  below,  updates  the  reachable  sets  of  G  ,  given 

s 

that  a  new  edge  (Nx/N  )  is  added  to  the  system.  In  step  A1 
of  the  algorithm,  the  reachable  set  of  the  node  N  is 
updated  (if  N  existed  in  G  )  or  created  (if  N  is  an 
entering  node).  In  step  A2  of  the  algorithm,  we  update  the 
reachable  sets  of  all  those  nodes,  whose  reachable  sets 
originally  contained  N  .  Step  A2  is  never  executed  if  N  is 
an  entering  node,  since  no  reachable  set  need  be  updated. 
Step  A3  of  the  algorithm  updates  the  system  graph  G  . 


ALGORITHM  A 


Input :  (N  ,N  ) 

a  y 

<Js  =  (N,E)  . 


the  new  edge  added  to  the  system  graph 


Output :  Updated  reachable  sets  of  all  nodes  in  Gg  and 

updated  G  . 

s 

Al:  [Update  reachable  set  of  N  1 

1  if  N„  |  N  then  R  (N  )  ()); 

X  X 


. 
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2  if  Ny  $  N  then  R  (N  )  4-  (|>; 

3  R(Nx)  4-  R(Nx)  [J  R(Ny)  [J  {Ny>  ; 

A2:  [Update  reachable  sets  of  all  nodesl 

1  if  N.  S  N  then 

2  beg  in 

3  for  i  =  1  until  |  N  |  cio 

4  if  Ny  6  R(N  )  then  R(N.)  4-  R  (N .  )  I  I  R(N  ); 

5  end ; 


A3:  [Update  the  system  graph  G  1 

O 

1  i_f  ^  N  then  N  N  □  {N  }; 

2  if  N  f  N  then  N  <-  N  {N  }; 

3  E  E  [J  {  (Nx'Ny)  }; 

Although,  it  is  uncomplicated  to  update  the  reachable 

sets,  when  an  arbitrary  edge  is  added  to  G  ,  it  seems  almost 

s 

impossible  to  do  so  when  an  arbitrary  edge  is  deleted  from 

<3  .  No  better  method  than  recalculating  reachable  sets 
s 

seems  feasible.  On  close  examination,  however,  it  becomes 

apparent  that  the  only  times  when  edges  are  deleted  from  G 

O 

are : 

(i)  a  process  runs  to  completion  and  releases  all  data 
resources  held;  and 

(ii)  a  deadlock  is  discovered,  and  at  least  one  of  the 
processes  involved  should  be  aborted,  implying  that  all 
data  resources  held  by  aborted  process(es)  must  be 
released,  and  also  all  access  requests  from  the  process 
(es)  are  to  be  dropped. 


• 

- 

, 

In  case  (i)  the  process  is  obviously  not  blocked  and  hence 
the  corresponding  process  node  in  G  is  a  sink.  Thus,  the 
edges  dropped  are  only  those  that  are  directed  to  a  sink. 
This  is  a  very  simple  case,  and  hence  an  algorithm  can  be 
devised  to  update  the  reachable  sets.  Whereas,  for  the 
deadlock  situation  of  case  (ii),  there  exists  no  sink  in  G  . 

b 

Therefore,  aborting  a  process  and  rolling  it  back,  requires 
us  to  recalculate  the  reachable  sets.  Maintenance  of  these 
sets  by  incremental  updates  considerably  decreases  the 
chances  of  synchronization  error  and  that  of  the  problem  of 
the  system  graph  becoming  obsolete  in  all  interactions  (a) 
to  (d)  discussed  in  Section  ^  ^ .  However,  for  interaction 
(e)  complete  reconstruction  of  reachable  sets  is  necessary. 

We  present  ALGORITHM  T  to  update  the  reachable  sets  of 

all  nodes  given  that  an  edge  is  deleted,  where  is 

a  sink  in  G  .  In  updating  G_ ,  the  edge  (N  ,N  )  is  deleted 
s  s  w  z 

from  E,  but  the  appropriate  node  deletions  are  done 
elsewhere  (in  the  algorithm  that  is  to  be  proposed  for  on¬ 
line  detection  of  deadlocks). 

ALGORITHM  T 

Input:  Edge  (N..,N_),  deleted  from  the  system  graph,  where 

w  z  z 

is  a  sink . 

Output:  Updated  reachable  sets  of  all  nodes  in  G„ ,  and 

b 

updated  E  of  G  . 

s 

T1 :  [Update  reachable  sets  of  all  nodes] 

for  i=l  until  I N |  do 


1 


- 


. 
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2  if  N  €  R(N.)  then  R  (N .  )  <-  R  (N  .  )  -  {N  }; 

-  Z  1  -  1  1  z 

T2:  [Update  the  set  of  edges  E  of  G  1 

s 

1  E  <-  E  -  {  (N  ,N  )}; 

w  z 

We,  now  propose  ALGORITHM  S  to  detect  deadlocks  on¬ 
line,  for  all  cases  discussed  in  Section  2 . 5 .  ALGORITHMS  A 
and  T  are  extensively  used  in  ALGORITHM  S,  to  update 

reachable  sets  as  edges  are  added  to  or  deleted  from  G  . 

s 

Step  SI  of  the  algorithm  updates  reachable  sets  for  a  new 

process  and/or  data  resource  entry.  Step  S2  deals  with  the 

case  in  which  a  process  runs  to  completion  and  releases  all 

data  resources  held.  In  S2a,  the  reachable  sets  are  updated 

by  deleting  edges,  corresponding  to  the  release  of  data 

resources.  Step  S2b  updates  the  set  of  nodes  in  the  system 

graph  for  the  released  data  resources  without  any  waiting- 

access  requests.  Allocation  of  released  data  resources  with 

single  waiting-access  requests  is  done  in  S2c.  In  step  S2d, 

for  a  set  of  processes  (P^l  waiting  for  a  released  data 

resource,  the  condition  in  Lemma  2  is  tested  successively 

5 

between  pairs  of  processes  until  a  process  is  found  whose 
reachable  set  contains  no  other  process  in  [Pw).  The  data 
resource  should  be  allocated  to  such  a  process  to  avoid  a 
potential  deadlock.  In  step  S3,  the  case  when  a  process 
requests  access  to  a  data  resource  held  by  another  process 
is  dealt  with.  Step  S4  deals  with  the  case  when  a  process 


5:  An  alternative  method  is  to  allocate  the  resource  to  a 

process  with  minimal  reachable  set  in  {P^}  as  discussed 
in  the  remarks  following  Corollary  3. 
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is  aborted  to  break  a  deadlock.  Tn  step  S5  we  carry  out  a 
test  for  the  existence  of  deadlock,  for  the  cases  dealt  with 
in  steps  S3,  and  S4. 


ALGORITHM  S 

SI:  Ta  process  enters  the  system,  or  a  new  data  resource  is 
accessed]  As  a  consequence  of  a  new  process  entry,  or 
accessing  a  new  data  resource,  or  both,  let  the  new  edge 
added  be  (N.,N^)«  Apply  ALGORITHM  A  with  the  edge 
(N.,Nj)  as  input,  and  STOP. 

S2:  [A  process  in  the  system  runs  to  completion  and  releases 
all  data  resources  held] 

S2a:  [Update  reachable  sets  by  deleting  each  edge]  Let 
the  process  which  runs  to  completion  be  ,  and  the 
data  resources  released  be  Dj ^ , D j ^ , . . . , Dj s .  Apply 
ALGORITHM  T  with  the  edge  (0 j  ± )  as  input,  for  all 
i,  l<i<s.  Delete  p.  from  the  set  of  nodes,  N  of  G  . 

S2b :  [Update  G  by  deleting,  from  N,  released  data 

s 

resource  nodes  without  any  waiting-access  requests] 

If  there  is  no  edge  directed  to  Dj  ^  then  delete  Dj . 
from  N  for  all  i,  l_<i£s. 

S2c:  [Allocate  released  data  resources  with  a  single 
waiting-access  requestl  For  every  released  data 
resource  Dj  for  some  x  €  {l,2,...,s},  with  single 
waiting-access  request  from  process  P^  (say).  Apply 
ALGORITHM  T  with  (P  r Dj as  input,  and  apply 
ALGORITHM  A  with  (D3x'Py)  as  inPut- 
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S2d:  [Allocate  released  data  resources  with  multiple 

waiting-access  requests]  For  every  released  data 
resource  Dj  .  for  some  x  €  {l,2,...,s},  with  multiple 

A 

waiting-access  requests  from  a  set  of  processes 
P  =  { Py  ^ , Py  2 / • • • f  Pym ) t  find  a  process  Py^  such  that 
Py.  ^  R(Pyk)  for  all  i,  l_<i£m.  Apply  ALGORITHM  T 
with  (pyk'Dix)  as  input  .  Apply  ALGORITHM  A  with 
( Py . , D j  )  as  input  for  all  i,  such  that  i^k  and 

1  X 

1 < i<m.  Apply  ALGORITHM  A  with  (Dj  .Py.  )  as  input. 
STOP. 


S3:  [A  process  in  the  system  requests  access  to  a  data 

resource  held  by  another  processl  Let  the  process  be 
P^  ,  and  the  data  resource  be  .  Apply  ALGORITHM  A  with 
the  edge  (p.,Dj)  as  input.  GO  TO  STEP  S5. 


S4:  [A  data  resource  held  by  a  process  is  preempted  from  it. 


to  break  a  deadlock]  Let  the  aborted  process  be  P^. 
Delete  from  the  set  of  edges,  E  of  G  all  edges  directed 
to  P j ,  and  all  edges  directed  from  P ^ .  Apply  Step  B1  of 
ALGORITHM  B  with  G  as  input,  and  GO  TO  STEP  S5. 


S5:  TDetect  deadlock  (if  any) 1  Apply  step  B2  of  ALGORITHM  B 
to  detect  deadlocks,  (if  any).  STOP. 


For  the  purpose  of  illustration  of  the  on-line  deadlock 


^ :  This  has  the  effect  of  deleting  Dj  from  the  reachable 
sets  of  all  Py .  for  l_<i<m.  Thus  the  reintroduction  of 
the  edges  (Py.^Dj  )  for  all  i,  such  that  i^k,  and  l£i£m, 
is  necessary. 
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detection  scheme,  we  utilize  the  network  configuration  of 
Figure  3.1.  We  assume  the  introduction  of  a  PRW  edge  from 
P2  to  D^,  consequent  to  P^  requesting  access  to  ,  and  an 
RPA  edge  from  D7  to  P,_  as  a  result  of  P,_  requesting  shared 
access  to  which  is  also  held  for  shared  access  by  P^. . 

Illustration  of  ALGORITHM  S 

SI:  Ta  process  enters  the  system,  or  a  new  data  resource  is 
accessed  or  bothl  Let  the  new  edge  added  be:  (p^,D^) 
(new  process  entry),  or  (D^,P2)  (new  data  resource 
entry),  or  (D^ , P^ )  (both).  Apply  ALGORITHM  A  with 

(P  ,D  ),  or  (D  , P  ) ,  or  (D  ,P  )  as  input.  STOP. 

0  1  0  2  0  0 

S2:  TA  process  in  the  system  runs  to  completion  and  releases 
all  data  resources  held!  Let  the  process  which  runs  to 
completion  be  P^,  and  the  data  resources  released  be  , 
D3,  and  . 

S2a:  rupdate  reachable  sets  by  deleting  each  edge!  Apply 
ALGORITHM  T  with  (D6,P3)  as  input;  Apply  ALGORITHM  T 
with  (D3,P3)  as  input;  Apply  ALGORITHM  T  with 
(D^,P3)  as  input;  Delete  P3  from  Gs« 

S2b:  [Update  G  by  deleting  released  data  resource  nodes 

s 

without  any  requestsl  Delete  D^  from  Gg. 

S2c:  [Allocate  released  data  resources  with  a  single 
wa i ting-access  requestl  Apply  ALGORITHM  T  with 
(p^,D  )  as  input;  Apply  ALGORITHM  A  with  (D^P^)  as 
input ; 

S2d:  [Allocate  released  data  resources  with  multiple 
waiting-access  requestsl  For  the  released  data 
resource  D^  the  condition  in  Lemma  2  dictates  its 
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allocation  to  P-^,  rather  than  to  P4.  Apply  ALGORITHM 

7 

T  with  (PlfD  )  as  input  .  Apply  ALGORITHM  A  with 
(P4,D4)  as  input;  Apply  ALGORITHM  A  with  (D^P^)  as 
input.  STOP. 

S3:  [A  process  in  the  system  requests  access  to  a  data 

resource  held  by  another  processl  Let  the  process  be 
Pg ,  and  the  data  resource  be  Dg .  Apply  ALGORITHM  A  with 
(P0,D0)  as  input.  GO  TO  STEP  S5. 

o  o 

S4:  f A  data  resource  held  by  a  process  is  preempted  from  it, 

to  break  a  deadlock]  Let  the  aborted  process  be  P^. 

Delete  from  G„  the  edges  (DQ,P_),  (D_,P_),  and  (P_,D_). 

s  o  d  /  d  oy 

Calculate  the  reachable  sets  of  all  nodes  of  G  .  GO  TO 

s 

STEP  S 5 . 

S5:  [Detect  deadlock  (if  any) 1  Apply  step  B2  of  ALGORITHM 
B,  to  detect  deadlocks,  (if  any).  STOP. 

2.8  Discussion 

3.8.1  Communication  Requirements 

In  distributed  database  performance,  the  communication 
time  is  a  critical  factor  to  be  optimized.  Consequently, 
inter- instal lat ion  communication  in  distributed  DBMS  can 
result  in  conspicuous  performance  degradation.  Thus,  the 
impact  of  inter-computer  communication  upon  system 


7:  This  has  the  effect  of  deleting  D4  from  the  reachable  set 
of  P4,  and  those  of  the  nodes  from  which  D4  was  access 
ible.  Thus,  the  reintroduction  of  the  edge  (P4,D4)  is 
necessary. 
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performance  should  be  an  important  consideration  in  the 
treatment  of  deadlocks  in  distributed  systems.  Not  only  is 
an  efficient  network  communication  mechanism  essential,  but 
also  an  effective  deadlock  handling  technique,  with  minimal 
communication  requirements. 

In  the  approach  proposed  in  this  chapter  to  detect 
deadlocks,  the  communication  needs  are  quite  modest.  Every 
time  a  data  resource  is  allocated  to  a  process,  a  process 
releases  a  data  resource,  or  a  process  is  made  to  wait  for  a 
data  resource,  the  access  controller  at  the  installation  in 
which  the  data  resource  resides,  sends  information  to  that 
fact  to  all  other  installations  in  the  network.  Such 
communication  is  necessary  only  for  processes  and  data 
resources  which  are  global  in  nature,  as  discussed  in 
Sections  3.4  and  3.5.  This  facilitates  maintaining  the 
system  graph  at  each  installation.  Thus,  maintaining  and 
updating  the  reachable  sets  is  done  at  each  installation  to 
detect  deadlock,  without  the  necessity  of  transmitting  huge 
tables  among  installations. 

In  the  approaches  by  Chu  and  Ohlmacher [ 1 974 ] ,  and 
Maryanski r l 977 ] f  process  sets  and  shared  data  lists  are 
respectively,  transmitted  over  communication  channels  to 
each  installation,  resulting  in  huge  message  traffic. 

Chandra  et  a^l.  T1974]  transmit  resource  tables,  which 
contain  information  pertaining  to  processes  allocated  local 
resources,  processes  waiting  for  access  to  local  resources, 
local  processes  allocated  remote  resources,  and  local 
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processes  waiting  for  access  to  remote  resources. 

Similarly,  Mahmoud  and  Rio rdon [ 1 977 ] ,  in  their  distributed 
approach,  in  a  network  of  * n*  computers  require  the 
transmission  of  (n-1)  identical  messages  containing  status 
and  queues  of  files.  Also  each  installation  receives  (n-1) 
different  messages  from  other  installations.  In  other 
words,  all  these  approaches  basically  require  the 
transmission  of  large  tables  among  installations,  resulting 
in  tremendous  communication.  Goldman r 1977 ] ,  in  his  approach 
requires  the  creation  of  OBPL,  which  is  expanded  at  each 
installation,  and  transmitted  from  there  to  the  next.  The 
expansion  and  transmission  of  OBPL's  is  done  till  a  decision 
is  arrived  at,  whether  or  not  a  deadlock  exists.  Even 
though,  Goldman's  approach  is  better  than  those  of  other 
authors  discussed  above,  the  OBPLs  go  through  several 
installations  sequentially  or  several  times  between  the  same 
two  installations  till  a  deadlock  is  detected.  Thus,  the 
approach  could  result  in  undue  waiting  period,  during  which 
a  totally  new  network  situation  could  arise.  All  these 
approaches  suffer  from  the  great  drawback  that  the  size  of 
the  tables  to  be  transmitted  increases  enormously  as  the 
unit  of  identifiable  resource  decreases  in  size  (e.g.,  from 
files  to  records) . 

8.3.2  Responsibilities  of  Data  Base  Administrator 

The  complexity  of  the  function  of  a  data  base  administrator 
(DBA)  increases  enormously  with  the  distribution  of  DBMS 
over  a  network  of  computers.  In  a  network  environment,  the 
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actions  of  the  operating  system  which  manages  application 
jobs  (processes),  and  that  of  the  DBA  who  maintains  the 
process  integrity  and  the  consistency  of  the  database  have 
to  be  coordinated.  A  thorough  understanding  of  the 
relationships  among  concurrency  controls,  processors, 
processes,  deadlock  handling  and  recovery  techniques, 
communication  aspects  and  protocols,  etc.,  is  necessary  for 
the  DBA  as  well  as  the  operating  system.  The  relevance  of 
such  coordination  may  necessitate  the  importance  of  having  a 
higher  authority  over  both  the  DBA  and  the  operating  system. 
This  area  probably  requires  a  deeper  study,  especially  in  a 
network  environment. 

As  more  and  more  data  is  integrated  over  a  network  of 
computers,  the  database  becomes  more  accessible  to  a  larger 
number  of  diverse  application  jobs,  thus  contributing  to 
generality  and  flexibility.  Simultaneously,  several 
operations  like  detection  of  deadlocks,  recovery  from 
deadlocks,  communication  requirements,  etc.,  become 
extremely  complex,  demanding  improved  operational 
efficiency.  This  boils  down  to  a  classical  trade-off 
between  generality  and  efficiency,  of  very  common  occurrence 
in  commercial  data  processing.  It  is  in  balancing  these  two 
apparently  conflicting  factors  that  the  role  of  DBA  assumes 
great  importance.  Very  significantly,  the  decisions  made  by 
DBA  will  have  substantial  impact  in  a  system  on  a  network  of 
computers  should  an  on-line  deadlock  detection  technique 
like  the  one  proposed  in  this  chapter  be  implemented. 
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3.9  Highlights  of  our  Proposal 

It  is  difficult  to  estimate  the  performance  effects  of 
deadlock  detection  or  deadlock  prevention  in  distributed 
DBMS ,  since  communication  time  is  critical.  Because  of  the 
complexity  of  distributed  DBMS  a  significant  factor  in 
handling  deadlocks  would  be  operational  efficiency.  But  the 
communication  aspects  make  it  impractical  to  estimate  the 
performance  of  such  algorithms  analytically.  Once 
distributed  DBMSs  become  a  common  reality  experimental  data 
can  be  gathered  to  measure  the  performance.  Nevertheless, 
our  proposal  has  the  several  advantages  shown  below: 

*  In  the  deadlock  detection  approach  proposed  in  this 

chapter,  the  communication  needs  are  quite  modest  in  the 
sense  that  it  is  incremental  and  dispersed  over  a  period 
of  time,  as  outlined  in  Sections  3.4,  3.5  and  3.8.1. 

*  The  technique  of  deadlock  detection  suggested  in  this 

chapter  identifies  the  processes  directly  responsible 
for  the  deadlock  (Corollary  1).  In  general,  it  is 
possible  to  find  a  process  blocked  forever  but  not 
deadlocked,  by  virtue  of  its  waiting  for  a  process  which 
is  involved  in  a  deadlock  (Corollary  2). 

*  Processes  are  never  delayed  by  our  technique,  because  a 

process  whose  request  for  a  data  resource  can  be  granted 
may  proceed  without  waiting  for  the  deadlock  detection 
mechanism.  The  DBA  can  design  a  scheme  to  invoke  the 
detection  mechanism  for  every  'X'  units  of  time,  or  for 
every  'Y'  accesses  granted,  or  for  every  'Z'  accesses 
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not  granted,  or  any  combination  of  these. 

*  In  our  approach,  a  process  may  have  any  number  of 

outstanding  requests  simultaneously.  For  most  others 
fe.g..  King  and  Collmeyer,  1973;  Goldman,  19771,  a 
process  is  restricted  to  have  at  most  one  outstanding 
request.  In  real  world  applications  this  restriction  is 
not  practical,  as  demonstrated  in  Appendix  B.  Thus  our 
approach  is  more  general  and  realistic. 

*  When  a  number  of  readers  share  a  data  resource,  our 

algorithm  does  not  require  any  special  treatment,  unlike 
in  Goldman's  approach  where  one  different  copy  of  the 
OB  PL  is  formed  for  each  such  reader.  In  the  case  of 
shared  access  his  approach  leads  to  a  heavy  overhead  in 
computation  and  in  communication. 

*  Our  proposal  deals  with  every  request  in  the  same  manner, 

and  can  be  considered  a  unified  approach,  since  the 
detection  technique  does  not  classify  requests  according 
to  the  relationship  between  the  origin  of  the  process 
and  the  installation  of  residence  of  the  data  resource 
accessed.  In  all  other  approaches  discussed  in  Section 
3 .  <3 . 3 ,  the  algorithms  deal  with  each  access  request 
according  to  the  classification  of  the  request. 
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CHAPTER  4 


SYSTEM  RECOVERY  IN  DISTRIBUTED  DATABASES 

4.1  Introduction 

For  distributed  databases,  very  interesting  concurrency 
controls  have  independently  been  designed  [Thomas,  1977; 
Bernstein  et  al_. ,  1978;  Rosenkrantz  £t  al.,  1978; 
Stonebraker,  19781  .  In  particular,  Bernstein  et.  al^.  propose 
an  effective  method  to  maintain  mutual  consistency  of 
multiple  copies  of  databases  and  internal  consistency  of 
each  copy  of  the  database.  This  method  incorporates 
deadlock  prevention  principles.  Excellent  response  time  is 
guaranteed  for  all  transactions  that  do  not  conflict. 
Transactions  that  cannot  be  shown  to  be  nonconflicting  need 
extensive  coordination  and,  in  general,  substantial 
communication  among  installations  is  required,  as  a  part  of 
the  deadlock  prevention  mechanism. 

Knowledge  about  the  probability  of  interference  or 
deadlock  in  concurrent  database  accesses  is  relatively 
unknown.  Peebles  and  Manning [1978]  strongly  believe  that 
interference  is  rare  in  transaction  processing  systems,  and 
do  not  support  the  idea  of  elaborate  avoidance  or  prevention 
mechanisms  which  are  considered  uneconomical.  We  agree  and 
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have  strongly  advocated  in  Chapters  2  and  3  the  use  of 
deadlock  detection  in  distributed  systems.  Further,  to 
quote  Le  Lannri978],  "our  conclusion  will  be  that  for 
systems  which  include  a  partitioned  database  and  which 
provide  for  the  storage  of  pending  requests,  maintenance  of 
internal  integrity  boils  down  to  a  problem  of  deadlock 
avoidance  or  detection  with  distributed  control".  In  view 
of  Peebles  and  Manning's  belief  and  Le  Lann's  conclusion,  we 
feel  that  our  on-line  detection  technique  provides  adequate 
measures  to  protect  the  database  against  process  failures. 
However,  neither  our  scheme  nor  that  of  Bernstein  e_t  al.  are 
robust  enough  for  system  crashes. 

Users  of  shared  databases  presume  that  the  consistency 
and  correctness  of  information  upon  which  they  work  is 
preserved,  under  a  wide  variety  of  system  malfunctions.  In 
a  network  environment  the  problems  faced  in  maintaining  data 
accuracy  are  even  more  severe.  In  this  chapter  we  present  a 
robust  approach  for  system  recovery  from  crashes. 


4.2  The  Problem,  Environment,  and  Basic  Strategy 


A  system  crash  normally 
tion  by  for  instance,  reloadi 
repeating  the  updates  on  the 
through  use  of  an  audit  trail 
a  procedure  may  be  expensive, 
consuming.  On  the  other  hand 


requires  database  reconstruc- 
ng  a  previous  save  and 
database  from  that  checkpoint 
.  In  a  batch  environment  such 
inconvenient  and  time- 
,  in  real-time  transaction 
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effects.  Fo 
previous  che 
will  remembe 
did  remember 
is  not  reali 
after  a  part 
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consequence 
little  more 
On  the  other 
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r  instance,  after  a  successful  reloading  from  a 
ckpoint,  it  is  unreasonable  to  expect  that  users 
r  long  sequences  of  transactions.  Even  if  they 
,  expecting  them  to  perform  their  own  recovery 
Stic.  In  some  systems,  continued  processing 
ial  recovery  may  not  be  a  serious  problem.  For 
upermarket  inventory  control  system  might  as  a 
attempt  to  oversell  a  product,  probably  causing 
than  some  inconvenience  and  minor  embar rassment . 

hand,  if  we  are  dealing  with  airplane  seats, 
wals  or  paychecks,  a  partial  recovery  could  have 
cuss  ions . 


In  essence,  we  are  considering  the  problem  of  reliable 
operation  of  a  distributed  data  management  system  in  the 
presence  of  failures.  More  formally  the  problem  is:  Given: 
— an  arbitrary  computer  network  with  distributed 
control,  and  an  appropriate  communication  system,  and 
--an  integrated  database,  appropriately  partitioned 
and/or  replicated,  is  distributed  over  the  network. 
Design  a  reliable  method  of  system  recovery  that  maintains 
database  consistency  during  update  transactions  in  the 
presence  of  either  system  failure  or  communication 
breakdown . 

Properties  that  characterize  a  "good"  solution  are: 

— Simplicity:  The  techniques  used  for  detecting  a  failure 
and  for  recovery  must  be  simple',  and  should  not 
incorporate  any  elaborate  methods. 
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— Tolerable  overhead :  Under  normal  circumstances  a  high 
overhead  results  in  fast  recovery,  while  lower  overhead 
provides  better  performance  but  slow  recovery.  We  stress 
the  importance  of  round-the-clock  access  to  the  on-line 
system,  and  advocate  that  the  overhead  involved  in 
maintaining  recovery  data  should  be  directly  proportional 
to  the  value  or  sensitivity  of  the  data  accessed. 

— Consi stency :  Mutual  and  internal  consistency  of  the 
database  should  be  maintained  after  recovery.  In  the 
event  of  brief  communication  failures,  the  design  of  an 
effective  procedure  is  necessary  for  reconstructing  a 
consistent  database  from  two  or  more  isolated  fragments. 

A  complete  technical  solution  to  this  problem  in  a 
partitioned  network  is  not  in  sight. 

— Partial  operabil ity ;  The  system  should  continue  to  operate 
in  the  case  of  failures  at  one  or  more  installations. 

— Avoiding  global  rol Iback :  As  far  as  possible,  rolling  back 
all  the  executing  transactions  to  some  common  checkpoint 
in  time  should  be  avoided. 

— Reliable  communication:  Guaranteed  delivery  of  messages  is 
a  necessary  requirement  to  ensure  reliable  recovery  and 
maintenance  of  consistency. 


Env i ronmen t : 

The  environment  with  respect  to  transactions, 
communication  aspects,  failure  and  recovery,  the 
timestamping  mechanism,  and  the  applicability  to  different 
types  of  networks  is  explained  in  detail: 
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1.  Transaction  categories:  The  traffic  processed  by  a 
distributed  database  system  is  assumed  to  fall  in  one  of 
the  three  broad  categories.  Special  purpose  transactions 
of  the  type  found  in  airline  reservation,  medical 
information,  or  banking  systems  require  extra  care 
because  of  the  sensitive  nature  of  the  application.  Less 
sensitive  applications  which  involve  dedicated  or  self- 
contained  equipment,  are  categorized  as  local  transaction 
systems,  (e.g.  warehouse  inventory  control,  payroll 
production,  or  student  record  systems).  Finally  the 
third  category  encompasses  global  transactions  which 
access  data  stored  in  two  or  more  different  installations 
via  a  communication  line. 

2.  Communication  aspects:  Every  message  sent  from  an 
installation  is  assumed  to  reach  the  destination 
eventually.  Messages  which  arrive  at  a  destination 
computer  from  a  common  source  are  processed  in  the  order 
of  initiation.  However,  the  messages  from  two  different 
installations  to  a  particular  installation  may  be 
processed  in  their  arrival  order.  To  ensure  that  any 
messag e  sent  from  one  installation  to  another  is 
eventually  delivered,  despite  a  finite  possibility  of 
failure  of  both  the  source  and  the  destination,  the 
concept  of  message  spoolers  THammer  and  Shipman,  197R1  is 
employed.  For  instance,  if  an  installation  wants  to 
communicate  a  message  to  crashed  installation  N?, 
establishes  several  copies  of  its  message  at  different 
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mono tonical ly  increasing  value.  Timestamps  are  used  for 
synchronization. 

5.  Networks:  The  approach  is  equally  applicable  to  all 

types  of  networks  from  "store-and-forward"  communication 
(e.g.,  ARPANET  [McQuillan  and  Walden,  1977])  on  one  side 
of  the  spectrum,  to  "broadcast"  networks  (e.g.,  ETHERNET 
[Metcalfe  and  Boggs,  1976])  on  the  other. 

The  basic  strategy: 


Although  jobs  consist  of  retrieval  and  update 
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transactions,  these  two  groups  may  be  subclassified  and 
recovery  protocols  defined  which  take  advantage  of  known 
properties  of  each  transaction  class. 

4.3  Transaction  Classification 

Transactions1  are  classified  according  to  the  degree  of 
requirement  for  fast  system  recovery.  The  pre-definition  of 
transaction  classes  can  be  performed  by  the  Data  Base 
Administrator  during  database  design,  depending  on  either 
the  information  about  the  specialty  of  the  data  stored 
(e.g.,  airline  reservation  system)  or  by  gathering 
statistics  about  the  database  usage  (e.g.,  predominantly 
local  transactions) .  The  defined  classes  can  then  be 
appropriately  matched  with  recovery  protocols  necessary  to 
maintain  the  system  specification.  The  pre-definition  of 
transaction  classes  does  not  constrain  the  system  from 
accepting  any  transaction,  but  rather  facilitates  the  use  of 
more  efficient  and  cost  effective  recovery  schemes  for 
transactions  of  known  or  predictable  behaviour. 

4.3.1  Retrieval  Transactions 

We  classify  retrieval  transactions  into  two  kinds.  The 
transactions  that  do  not  require  read  locks  will  be  referred 
to  as  that  of  class  Tl.  Transactions  of  class  T1  are  those 
that  do  not  really  care  if  someone  else  is  modifying  the 
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"Processes"  and  "transactions"  are  used  interchangeably. 
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same  data.  For  example,  in  an  automated  library  database, 
where  the  majority  of  users  are  those  who  search  for  the 
presence  of  a  particular  document,  one  is  not  disturbed  by  a 
transaction  which  at  the  same  time  changes  the  status  of 
this  document  from  "available  for  loan"  to  "loaned  to  Mr. 
Smith".  Such  a  user  should  be  allowed  to  access  the 
database  for  reading  even  if  the  data  is  locked  for 
modification.  This  not  only  enhances  concurrency  in  the 
database,  but  also  helps  us  to  devise  a  simple  and 
inexpensive  recovery  protocol  for  a  large  number  of 
retrieval  transactions  of  this  kind.  System  performance  is 
enhanced  on  account  of  the  reduction  in  locking  costs. 

Transactions  of  class  T2,  on  the  other  hand,  are  those 
that  require  locking  of  the  shared  data  for  reading,  and 
include  all  those  that  are  not  in  Tl.  A  transaction  of 
class  T2  has  to  wait  for  access  for  any  data  that  is  locked 
for  modification,  and  also  any  data  locked  for  reading  by  a 
transaction  of  class  T2  cannot  simultaneously  be  locked  for 
modification  by  any  other  transaction. 

4.3.2  Update  Transactions 

(a)  Special  purpose  systems,  which  basically  serve  one 

specialized  application  but  have  lived  up  to  expectations 
as  a  distributed  database  system,  are  considered  here. 
Each  specialized  system  has  its  own  recovery 
r equi r ements . 


In  the  case  of  information  systems  for  which  the 
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probability  of  failure  is  required  to  be  small,  such  as 
those  needed  for  space  shuttle  applications,  aircraft 
flight  control,  and  to  some  extent  airline  reservation 
systems,  speedy  system  recovery  and  data  validation  is 
essential.  Only  very  occasional  small  losses  can  be 
tolerated.  Thus,  the  recovery  data  and  protocol  are 
quite  elaborate.  We  shall  refer  to  this  class  of 
transactions  by  Ul.  For  the  space  shuttle  and  airplane 
traffic  control  applications,  standby  systems  provide 
further  protection. 

A  class  of  specialized  systems  includes  those  where 
the  requirement  for  user-definition  of  user-owned 
information  is  high,  and  yet  data  sharing  is  very 
important,  and  the  data  files  are  owned  locally  by  the 
creator  of  that  data,  but  are  accessible  to  others 
through  a  network.  For  instance,  in  a  distributed 
medical  information  system,  the  producers  of  information 
are  geographically  separated,  like  physician’s  offices, 
pharmacies,  laboratories  and  hospitals.  Banking  and 
telephone  systems  can  also  be  classified  according  to 
this  kind  of  transaction,  which  we  shall  refer  to  by  U2. 
For  computerized  telephone  systems,  very  infrequent 
isolated  small  breakdowns  can  be  tolerated,  likewise  in 
banking  systems,  significant  processing  and  storage 
facilities  are  typically  incorporated  into  computer 
terminals.  Such  terminals  provide  data  input  and 
possibly  limited  data  validation,  even  when  the  main 
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computer  system  is  down.  Moreover,  the  class  of 
transactions  U2  do  not  require  recovery  techniques  as 
elaborate  as  for  the  class  Ul. 

(b)  Systems  with  predominantly  local  transactions,  for 
instance  a  university  student  database,  a  payroll  system, 
or  an  inventory  control  system  for  a  chain  of 
supermarkets,  are  classified  as  US.  Commonly,  over  95- 
99%  of  the  transactions  are  local  TStonebraker ,  1978] . 

The  transactions  of  class  U3  can  put  up  with  system 
crashes  better  than  those  of  classes  Ul  and  U2,  and 
hence,  fairly  inexpensive  recovery  protocols  and  recovery 
data  maintenance  are  allowable. 

(c)  U4  is  the  class  of  transactions  which  access  data  stored 
(not  redundantly)  in  two  or  more  installations.  This 
class  may  include  Personnel  Management  and  Budgetary 
Accounting  systems.  Typically,  a  transaction  of  this 
class  accesses  data  stored  in  the  installation  where  it 
originates,  and  travels  to  another  for  further  processing 
and  so  on. 

(d)  The  class  of  transactions  U5  is  that  which  accesses  data 
stored  (redundantly)  in  two  or  more  installations, 
handling  fairly  generalized  transactions.  An  appropriate 
update  algorithm  [Bernstein  e_t  a\_. ,  19781  that  maintains 
internal  and  mutual  consistency  of  the  database  is 
assumed.  A  transaction  of  this  class  accesses  the  data 
stored  in  the  installation  where  it  originates,  and 
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travels  to  the  nearest  installations  where  the  redundant 
data  is  supposedly  stored  for  further  processing  and  so 
on.  In  practice  the  U4  class  of  transactions  may  be 
thought  of  as  a  subset  of  U5. 

(e)  U6  includes  the  transactions  that  were  not  anticipated 
ahead  of  time.  On  account  of  their  unpredictable  or 
unexpected  behaviour,  these  transactions  need  extensive 
audit  data  and  an  elaborate  protocol  for  system  recovery. 
Typically  transactions  for  new  applications  for  which 
there  is  insufficient  classification  data  are  included  in 
this  class. 

4.4  System  Recovery  Protocols 

The  approach  outlined  here  differs  from  many  other 
methods  because  it  does  not  assume  that  every  transaction 
requires  recovery  data  (redundant  data  stored  to  make 
recovery  possible)  maintained  in  the  form  of  "audit  trails" 
(recording  of  'who  did  what  to  whom,  when,  and  in  what 
sequence').  Instead,  appropriate  inexpensive  system 
recovery  protocols  are  proposed  for  transactions  whose 
behaviour  can  be  pr e-determined . 

If  a  transaction  belongs  to  class  Tl,  access  is  allowed 
to  the  data  to  be  retrieved  irrespective  of  whether  the  data 
is  locked  by  any  other  transaction  or  not.  The  protocol  in 
the  case  of  a  transaction  of  class  T2  requires  waiting  if 
the  data  is  locked  for  modification.  The  transaction  is 
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free  to  proceed  if  the  data  is  locked  for  shared  use 
(reading).  Neither  class  T1  nor  class  T2  transactions 
require  maintenance  of  recovery  data  for  crash  resistance. 
However,  a  mechanism  should  be  provided  to  indicate  to  the 
process : 

(i)  its  wait  state,  if  waiting  for  locked  data;  or 

(ii)  if  the  data  requested  resides  in  a  site  which  has 
crashed  or  if  all  communication  links  to  the  site  have 
failed. 

Recovery  protocols  fo r  update  transactions : 

A.  Recovery  Protocol  SP1 : 

This  is  required  by  transactions  of  class  Ul,  which  may 
originate  from  remote  locations  and  terminals. 

1)  The  originating  transaction  is  uniquely  timestamped 
(i.e.,  two  transactions  starting  at  the  same  time  from 
different  sites  are  assigned  distinct  timestamps). 

2)  Provided  the  timestamp  of  the  accessed  data  is  less  than 

? 

that  of  the  transaction,  synchronize  the  clocks  of  other 
sites  to  the  timestamp  of  the  transaction,  provided  it 
has  to  access  data  from  these  sites.  Otherwise  reassign 
a  higher  timestamp  to  the  transaction,  and  repeat  step  2. 

3 

3)  Modified  data  is  timestamped  at  the  site  (say  ) .  An 
audit  trail  is  maintained  by  using  a  wr i te  ahead  log 

2:  This  step  is  included  to  provide  synchronization  of  time 
clocks  necessary  for  updating  of  replicated  data  in  a 
distributed  database. 

3:  Timestamp  with  the  system-time  at  the  moment  of  creation 
of  the  entry. 
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protocol ,  which  requires  that  the  audit  trail  be  written 
to  non-volatile  storage  befo re  the  database  is  updated 
[Gray,  1978] . 

4)  If  such  modified  data  is  redundantly  stored  (say  at  site 
N^)f  then  site  sends  a  timestamped  update  message  to 
be  written  to  the  audit  trail  at  site  N2*  Subsequently 
the  database  is  updated  at  site 

5)  Periodic  incremental  dynamic  dumping  [Rosenkrantz,  19781 

4 

of  the  database  is  carried  out  to  provide  checkpo ints . 
Incremental  dynamic  dumping  can  be  facilitated  by 
maintaining  a  differential  file  [Severance  and  Lohman, 
1976]  . 

Remarks :  The  timestamps  uniquely  identify  the  transactions 
in  the  recovery  data.  An  audit  trail  facilitates  crash 
resistance,  backing  out  of  any  transaction,  and  allows 
certification  of  system  integrity,  when  necessary. 
Incremental  dumping  can  be  carried  out  frequently  to  provide 
recent  checkpoints  which,  in  conjunction  with  the  audit 
trail,  helps  speedy  system  recovery.  Maintaining  a 
differential  file  as  an  "add  set",  "delete  set",  and  "change 
set"  makes  dynamic  dumping  easier. 


4:  For  this  case  a  strategy  for  optimal  checkpointing  is 
suggested  in  this  chapter. 
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B.  Recovery  Protocol  SP2: 

Transactions  of  class  U2  require  a  less  elaborate 
protocol  than  SP1.  Incremental  dumping  here  need  not  be 
either  dynamic  or  frequent,  relative  to  SP1. 

1)  Same  as  step  1  of  SP1. 

2)  Same  as  step  2  of  SP1. 

3)  Modified  data  is  timestamped  at  the  site  (say  ) ,  and 
the  recovery  data  is  stored  in  the  form  of  a  differential 
file.  A  write  ahead  log  protocol  is  used  to  first  copy 
the  differential  file  to  non-volatile  storage,  before 
certifying  process  termination. 

4)  Sites  which  store  the  modified  data  redundantly  are  sent 
a  timestamped  update  message.  When  a  site  receives  such 
a  message,  the  update  communicated  is  written  to  the 
differential  file  maintained  at  that  site. 

5)  At  a  pr e-determined  point  in  time  the  differential  file 
is  merged  with  the  database  to  provide  an  up-to-date 
database,  which  is  then  dumped  for  use  as  a  checkpoint. 

C.  Recovery  Protocol  SP3 : 

Class  U3  transactions  which  are  predominantly  local  in 
nature  require  dumping  once  every  24  hours  or  so.  In  order 
to  provide  crash  resistance,  recovery  data  is  stored  in  the 
form  of  a  differential  file. 

1)  Timestamp  the  originating  transaction. 

2)  Store  recovery  data  in  the  form  of  a  differential  file 
maintained  at  that  site  using  a  write  ahead  log  protocol. 


3)  Once  every  24  hours  (or  as  determined  by  the  DBA)  merge 
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the  differential  file  with  the  database  to  update  the 
data  stored.  Checkpoint  the  database  after  every  such 
major  update. 

D.  Recovery  Protocol  SP4: 

Transactions  of  classes  U4  and  U5  use  this  protocol. 

An  appropriate  algorithm  [Bernstein  _et  a_l.  ,  19781  for 
updating  and  maintaining  consistency  of  the  database  in  a 
redundant  case  is  assumed. 

1)  Same  as  step  1  of  SP1. 

2)  Same  as  step  2  of  SP1. 

3)  Same  as  step  3  of  SP2. 

4)  Same  as  step  4  of  SP2 . 

5)  Same  as  step  5  of  SP2. 

E.  Recovery  Protocol  SP5: 

The  class  U6  of  unanticipated  transactions,  (whose 
behaviour  with  respect  to  updating  data  could  not  be  pre¬ 
determined)  use  the  same  protocol  as  SP4  except  that 

(i)  an  audit  trail  is  maintained  at  step  3  of  SP4  using 
write  ahead  log  protocol,  and 

(ii)  the  update  message  communicated  to  another  site  in  step 
4  is  first  written  to  the  audit  trail,  before  committing 
the  update. 

4.5  Optimal  Policy  for  Checkpointing 

The  optimization  problem  involves  balancing  the 
unavailable  time  during  the  creation  of  a  checkpoint  against 
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the  unavailable  time  after  a  failure,  when  a  saved  version 

of  the  database  is  being  updated  from  the  audit  trail. 

Several  assumptions  made  in  formulating  this  problem  are 

specified  below: 

(1)  For  our  convenience,  a  database  formed  from  a  relation 
in  Codd'sri970]  relational  model5  is  used.  The  relation 
is  assumed  to  have  a  time-varying  number  of  tuples 
dependent  on  the  update  activity. 

(2)  For  any  given  small  interval  of  time,  the  number  of 
transactions  processed  is  proportional  to  the  time 
interval.  This  still  allows  the  constant  of 
proportionality  to  differ  for  different  intervals,  but  it 
is  dependent  on  the  traffic,  which  can  be  pr e-determined . 
Thus,  for  a  given  time  interval  from  t^  to  t  ,  the  number 
of  transactions  processed  is  assumed  to  be  equal  to 

K0  *  (t^-t^)  where  is  the  transaction  processing  rate 
in  the  interval  (t^t^),  assumed  constant. 

(3)  The  average  numbers  of  tuples  read  and  written  by  a 
transaction  is  equal  to  r  and  w  respectively.  The 
quantity  r  is  always  greater  than  or  equal  to  w  since 
every  tuple  written  has  to  be  read  prior  to  an  update, 
but  not  vice  versa . 

(4)  A  system  failure  may  occur  at  any  time. 

(5)  The  cost  of  checkpointing  a  relation  is  assumed  to  be 
proportional  to  the  number  of  tuples  in  the  relation. 


5:  The  analysis  is  applicable  to  any  model  of  data 
representation,  despite  the  assumption  about  the 
relational  model. 
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Let  the  unit  cost  per  tuple  be  'c'.  Similarly,  the  cost 
of  reloading  a  checkpointed  database  is  assumed  to  be  'E' 
times  the  cost  of  checkpointing. 

(6)  The  cost  of  maintaining  an  audit  trail  is  dependent  on 
the  sum  of  the  number  of  tuples  read  and  the  number  of 
tuples  written  (the  length  of  the  entries  made  in  the 
journal).  The  cost  of  using  an  audit  trail  to  recover 
after  a  failure  is  'F'  times  the  number  of  entries  for 
tuples  modified. 


Since  the  assumptions  are  independent  of  the  size  of 
the  database  we  shall  consider  a  single  relation  for  our 
analysis.  Let  t0,  t^,  ,  ...,  t^  be  points  on  a  time  scale 

such  that  the  rates  of  processing  transactions  in  the 
intervals  (t0,t  ),  (t-^t  ),  ***'  ^k-l'^  be  different 

constants,  { K  ^ }  ,  in  each  interval.  Thus,  the  number  of 
transactions  processed  during  the  time  period  from  t0  to  t^ 


1  s : 


k-1 

5 


K.  *  (t. . _  -  t. ) 

l  l  +  l  l 


Therefore  the  numbers  of  tuples  read  and  written  during  the 
time  period  t0  to  t^  is: 


k-1 

( r  +  w)  *  y 

TW 


K .  *  (t.  .  -  t . ) 

l  l  +  l  l 


Let  us  assume  that  a  system  failure  occurs  at  time  t ^ , 


Uf  >  tk). 


The  cost  of  reloading  a  saved  version  (from  time  t0) ,  had 
checkpointing  not  been  done  at  t^,  is  proportional  to  the 
cost  of  checkpointing  at  t0  (C  ) .  (Assumption  5.) 
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where  E  is  the  save/restore  checkpoint  ratio,  and  is  the 
number  of  tuples  checkpointed  from  the  given  relation  at 
1 1  m  e  t  ^  • 

The  reprocessing  cost  of  the  audit  trail  (C  )  for  recovery 

a  U 

is  proportional  to  the  number  of  tuples  updated  between  t^ 
and  tf.  (Assumption  6.) 

Cau  -  p  *  w  *  K.  *  (tj  -  t.))  +  Kk  *  (tf  -  tk)l 

1  =  V) 

where  F  =  maintenance/processing  audit  trail  ratio. 

Total  recovery  cost  (if  checkpoint  was  not  taken  at  tk) : 

A  =  C  +  C 

c  au 

Total  cost  of  recovery  +  cost  of  checkpointing  at  tk  (if  the 
database  was  checkpointed  at  tk) : 

B=E*c*Nk+F*w*Kk*  (tf  -  tk)  +  c  *  N  , 
where  N,  =  the  number  of  tuples  checkpointed  at  time  t  . 

K  K 

If  A  >  B,  then  it  is  worthwhile  checkpointing  at  tk-  This 
yields  the  condition  that, 


k-1 

F  *  w  *  5  K.  *  (t. 
M 


t^)  be  greater  than 


C  *  [  (  (E  +  1)  *  Nk)  -  E  *  N0)]  . 

Intuitively,  if  the  number  of  tuples  added  and  deleted  are 
approximately  equal,  and/or  if  the  number  of  tuples  in  the 
relation  or  database  is  large  compared  to  the  number  of 
tuples  updated,  we  have  Nk  approximately  equal  to  N  .  This 
in  turn,  leads  the  above  condition  to  be  interpreted  as 
follows:  Checkpointing  at  a  time  tk  is  cost-effective,  if 
the  cost  of  processing  the  audit  trail  for  recovery  from  the 
previous  checkpoint  to  tk  exceeds  the  cost  of 
checkpointing  at  time 
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Remarks :  The  optimal  policy  for  checkpointing  suggested  for 
protocol  SP1  here  is  derived  using  a  very  simple  model.  The 
assumption  about  the  number  of  transactions  processed  in  a 
given  interval  of  time  is  realistic  since  for  a  small 
interval  the  time-dependent  ' s  can  be  estimated.  This  is 
especially  true  in  the  environment  we  have  considered  for 
system  recovery  where  prior  transaction  cl  ass i f ica tion  is 
done  at  database  design.  The  policy  determines  the 
checkpoint  dynamically  as  the  transactions  are  processed, 
and  is  different  from  earlier  fixed  interval  approaches. 

The  feasibility  of  its  implementation  is  enhanced  by  virtue 
of  the  parameters  involved.  All  the  necessary  parameters 
can  be  pre-determined  from  either  the  information  provided 
by  DBA  or  from  database  usage  statistics  and  expected 
behaviour  of  certain  transactions.  Consequently,  the 
approach  is  new  and  practical. 

4.5.1  Audit  Trail  Maintenance  Policy 

The  decision  to  maintain  audit  trails  should  be 
dependent  on  some  of  the  following  considerations. 

(i)  Necessity  of  checking  for  security  breaches; 

(ii)  Enforcing  consistency  requirements  and  authorized 
access  to  data; 

(iii)  Record  on-line  transactions  for  automatic  recovery  in 
the  case  of  a  system  crash; 

(iv)  Backing  out  any  single  transaction,  and  recovering  from 
deadlock. 

Remarks:  The  audit  trail  should  be  physically  reconstruct- 
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able  if  damage  occurs  to  it.  Damage  repair  to  the  audit 
trail  is  critical,  which  if  not  done  invalidates  the  system 
recovery.  Such  recreation  of  the  recovery  data  may  be 
effected  by  either  duplication  of  the  audit  trail,  or  use  of 
Hamming  codes  and  a  salvager.  Should  related  recorded  data 
in  the  audit  trail  be  distributed  between  physically 
separated  audit  trails  or  within  the  same  audit  trail,  a 
mechanism  for  synchronization  is  necessary.  Common 
transaction  identification  (the  unique  transaction 
timestamp)  provides  a  criterion  for  synchronizing  the 
distributed  recovery  data. 

4.5  Domino  Effect  (Global  Rollback) 

Verho f s tad [ 1 977 1  describes  a  recovery  scheme  which  is 
implemented  for  a  filing  system  supporting  a  single  user. 

In  this  mechanism,  recovery  and  crash  resistance  is  provided 
within  the  concept  of  a  "recovery  block"  (Randell,  19751. 

In  the  case  of  a  system  failure/crash  inside  the  scope  of  a 
recovery  block,  the  system  will  be  "rolled  back"  to  the 
state  that  existed  upon  entry  to  the  recovery  block. 
Checkpointing  at  the  beginning  of  every  recovery  block  is 
done  dynamically.  "Commitment"  at  the  end  of  a  scope,  or 
"undoing"  within  or  at  the  end  of  a  scope  can  be  achieved  by 
invoking  procedures.  Verhofstad's  mechanism  also  provides 
schemes  to  define  a  scope  dynamically,  and  to  back  out  on 
request,  in  case  of  a  failure. 


The  question  of  recovery  in  systems  with  multiple 
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concurrent  updates  by  simultaneously  executing  processes  has 
been  the  topic  of  ongoing  research.  Progress  has  not  been 
thorough  or  complete,  because  of  several  significant 
problems  encountered  among  interacting  processes. 

Verhofstad [19781  has  stated  the  major  difficulties  faced  by 
the  designers  of  System  R  [Astarhan  e_t  aJL_. ,  1976]  for 
recovery  in  a  multi-user  environment.  Consequently,  the 
scheme  is  now  supposedly  used  for  recovery  from  total  system 
failure  only. 

In  a  transaction  processing  system,  the  abnormal 
termination  of  a  process  has  disastrous  effects  on  the 
consistency  of  the  database.  In  an  interactive  environment, 
the  data  modified  by  an  abnormally  terminating  process  may 
possibly  have  been  used  by  others  which  in  turn  perform 
additional  modifications  to  the  database.  This  phenomenon 
of  a  process  generating  additional  incorrect  data  can 
cascade  throughout  the  database.  Consequently,  a  large 
number  of  processes  may  be  operating  with  potentially 
invalid  or  contaminated  data.  Therefore,  the  elimination  of 
the  effect  on  the  database,  by  backing  out  an  abnormally 
terminating  process,  is  an  essential  part  of  any  recovery 
technique.  A  major  difficulty  that  is  faced  here  is  that 
the  process  being  backed  out  may  require  the  backing  out  of 
other  processes  creating  what  Randell  calls  a  "domino 
effect" . 

The  domino  effect  is  illustrated  with  an  example  in 
Figure  4.1  by  using  three  processes  ,  V and  P^ .  Solid 
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lines  directed  from  left  to  right  show  the  progress  of  each 

of  these  processes  with  time.  In  this  example  each  process, 

P^  ,  has  entered  four  recovery  blocks,  but  has  not  yet  exited 

any  of  them.  The  times  at  which  processes  enter  recovery 

blocks,  referred  to  henceforth  as  recovery  points  are 

represented  by  R^j  for  1  j<_4 ,  for  every  process  P.  .  The 

dotted  lines  between  processes  indicate  process 

interactions.  For  instance,  in  Figure  4.1,  interactions 

have  taken  place  between  processes  P-^  and  P ^  at  times 

tg,  t^,  and  tg .  Similarly,  at  times  t  ^ ,  t^.,  t^ ,  and  t7 

between  processes  P 2  and  Pg.  Should  process  P^  now  fail  at 

the  current  time  T  ,  it  will  have  to  be  backed  up  to  its 

P 

most  recent  recovery  point  No  other  processes  will  be 

affected,  since  between  R,  „  and  T  no  interactions  have 
taken  place  with  process  P^.  On  the  other  hand,  suppose 
that  process  P9  fails  at  T  ,  resulting  in  backing  up  P  to 
its  newest  recovery  point  R^.  Since  R  is  prior  to  an 
interaction  at  time  tg  with  process  P^,  P  must  be  backed  up 
to  its  first  recovery  point  which  precedes  tg ,  that  is  R^g. 
However,  if  process  Pg  is  to  be  backed  up  due  to  failure, 
all  the  processes  P-j  ,  P an<3  P3  will  have  to  be  backed  up 
right  to  the  beginning  of  each  process,  thus  illustrating 
the  domino  effect  in  extreme. 
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Figure  4.1:  Domino  Effect. 


4 . 7  Transaction  Processing  Model 

The  multi-process  database  model  consists  of  a  set 
P  =  {P^ , P2 , . . . , Pn)  °f  processes  executing  concurrently. 
Processes  are  considered  to  consist  of  possibly  several 
recovery  blocks  for  rollback  purposes;  so  that  a  process,  in 
the  event  of  a  failure,  will  be  rolled  back  to  the  beginning 
of  the  newest  recovery  block. 

T  is  a  set,  { t^ , t 2 , . . . , t^}  of  times,  such  that 
t.  <  t..,  for  all  l<i<m-l,  at  which  interactions  have  taken 
place  between  processes  in  P.  Each  t^  in  T  is  associated 
with  at  least  one  interaction  between  processes  in  P. 

The  set  of  recovery  points  of  any  given  process  P^  is 
represented  by  R^  =  { R  ^  j }  for  l<_j<_k  where  k  is  the  number  of 
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recovery  blocks  the  process  has  entered. 

An  interaction  tuple  (t^,P^)  associated  with  a  process 
P^  represents  that  at  time  t ^  processes  P^  and  P^  have 
interacted  with  each  other.  For  every  tuple  (t.,Pj), 
associated  with  process  P.  ,  there  is  a  tuple  (t.,P  ) 
associated  with  process  Pj.  The  set  of  all  interaction 
tuples  associated  with  all  the  processes  is  a  subset  of  the 
cartesian  product  T  x  P. 


The  progress  of  a  process  P^  is  characterized  by 
ordering,  in  the  time  domain,  the  union  of  the  set  of 
interaction  tuples  associated  with  P^  and  (the  set  of  all 
recovery  points  of  P^). 


For  instance,  from  Figure  4.1,  the  progress  of 
processes  P^  and  P 2  can  be  respectively  represented  by  the 
lists  and  L9  given  below: 

L-^  -  {R^|f(t^,P^),  ( 1 3  ,  ?2  )  '  ^i  2  '  ,  P2  )  '  ^  4  ^  * 

l9  =  (R21 ,  (t1,P3),  (t2,P1),  r22,  ( t 3 , ) ,  (t4,P1), 

(tg,P^),  R23,  (  t^  ,  P3  )  ,  ^b9,P2),  R24  ,  (tq,?^)}. 


4.8  The  Backup  Algorithm 

We  present  an  algorithm  to  determine  which  specific 
recovery  point  the  process  which  has  failed  is  to  be  backed 
up.  The  scheme  also  determines  how  far  other  processes  are 
to  be  backed  out.  The  method  assumes  that  the  information 
on  the  interaction  tuples  and  the  progress  of  the  processes 
is  specified. 
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ALGORITHM  G 

Input :  (i)  Progress  of  all  the  processes  concerned  (that  is, 

the  set  of  all  interaction  tuples,  and  the  set  of  all 
the  recovery  points); 

(ii)  the  process  ,  which  has  failed  at  the  current 
time,  T  . 

Output :  (i)  Recovery  point  to  which  the  process  P^  is  to  be 

backed  up;  and 

(ii)  recovery  points  to  which  processes  (if  any)  that 
may  have  interacted  with  P,  are  to  be  backed  up. 


/*  Data  Structures  */ 


SET:  A  set  variable  to  hold  the  processes  to  be  backed  up. 
Initially  it  is  set  to  the  process  which  has  failed. 


BACKUP(Pj;):  An  array  of  size  n,  which  holds  the  recovery 
point  to  which  the  process  P^  is  to  be  backed  up. 
Initially  this  is  set  to  the  current  time,  T  .  At  the 
termination  of  ALGORITHM  G,  a  process  P-  with 
BACKUP (Pj)  set  to  T  needs  no  backing  up. 


PREVIOUS_BACKUP (P.) :  An  array  of  size  n,  which  holds  the 

previous  recovery  point  to  which  the  process  P.  was 

backed  up.  Therefore  PREVIOUS_B ACKUP (P • )  is  always 

greater  than  or  equal  to  BACKUP(P-).  Initially,  this 

array  is  also  set  to  the  current  time,  T  . 

P 


TNTERACTION_TIME (P ^ ) :  An  array  of  size  n,  which  holds  the 
time  at  which  process  P.  had  an  interaction  with  some 
process,  past  its  own  backup  point  BACKUP(P-). 
INTERACT ION_T I ME (P  ^ )  facilitates  choosing  tne  next 
backup  point:  of  P^  past  this  interaction. 


/*  The  Algorithm  */ 


G1 :  (Initial izel 

SET  < —  {Pk>; 

for  i  =  1  until  n  do 


beg  in 
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BACKUP (P.)  < —  Tp; 
PREVIOUS_BACKUP (P.)  < —  Tp; 
INTERACTION  TIME  (P  - )  < —  T  ; 

-**  ir 

end ; 


G2:  [Select  process  for  backing  up] 
if  SET  =  (j)  then  STOP 


else  any  Pj  €  SET ,  SET  < —  SET  -  { P ^ ; 
and  repeat  steps  S3,  S 4  for  P.. 


G3:  [Determine  backup  pointl 

PREVI OUS_BACKUP (P j)  < —  BACKUP (P  j )  ; 

BACKUP (P j )  < —  Rjff  such  that 

RjX  <  PREVIOUS_B ACKUP (P j)  and 

R.,  <  INTERACTION  TIME(P.); 

Dl  -  y 

G4:  [Find,  if  any,  interactions  with  backed  up  processl 

For  all  interaction  tuples  associated  with  Pj,  (t,P^) 
such  that  BACKUP (Pj)  <  t  <  PREVIOUS_BACKUP (P ^ )  set 
SET  < —  SET  □  {p.}?  For  every  such  P.  with  (t.  ,P.), 

(t-0,P.)/  .  (t..,P.)  where 

1  cL  1.  1  K  1 

BACKUP (P  • )  <t. _ <t. _<. . .<t.,  <  PREVIOUS  BACKUP(P.),  set 

3  il  i2  ik  -  3 

INTERACTION_TIME (P . )  < — min{ t.  , INTERACTION_TIME (P . ) } . 

G5:  [Repeat  for  interacting  processes  (if  any) 1 
Go  To  G 2 . 


4.9  Recovery  from  Different  Failures 


The  recovery  aspects  for  the  following  wide  spectrum  of 
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system  malfunctions  is  considered  in  this  section. 

(a)  A  storage  failure  (head  crash); 

(b)  A  system  crash  (software  failure); 

(c)  A  lost  message; 

(d)  A  duplicated  message; 

(e)  A  lost  process  (due  to  system  crash  and  subsequent 

recovery) ; 

(f)  Network  partitioning; 

(g)  Operation  with  missing  nodes; 

A  head  crash  on  disk  may  destroy  not  only  the  data  but 
also  the  recovery  data  stored  in  the  form  of  an  audit  trail 
or  differential  file.  Should  such  a  crash  occur,  the 
recreation  of  the  recovery  data  is  achieved  by  duplication 
of  audit  trails  or  differential  files.  In  practice,  there 
must  be  belief  and  dependence  on  some  ultimate  recovery  data 
(or  rather  the  fact  that  100%  reliability  is  not  provided  by 
any  recovery  data,  should  be  recognized).  However,  the  data 
damaged  in  a  head  crash  can  be  recovered  by  loading  a  backup 
version  and  reprocessing  from  audit  trails. 

A  system  crash  may  potentially  leave  data  mutually  and 
internally  inconsistent  at  the  site  of  the  failure.  After 
the  site  comes  up,  the  internal  consistency  of  the  data  is 
maintained  by  either  reprocessing  the  transactions  that  were 
active  at  the  time  of  crash,  or  by  backing  out  certain 
transactions.  The  principal  mechanism  that  helps  maintain 
mutual  consistency  (i.e.,  re- integ rating  the  site  into  the 
system)  is  called  "persistent  communication”.  This 
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mechanism  is  a  clean  way  of  accomplishing  re- integ ra t ion  of 
the  system.  Different  forms  of  persistent  communication  are 
used  [Ellis,  1977;  Thomas,  1977;  Hammer  and  Shipman,  1978]. 
The  concept  of  message  spoolers  can  be  used  to  maintain 
mutual  consistency  with  the  help  of  messages  stored  when  the 
site  was  down. 

A  lost  message  can  be  similarly  obtained  from  the 
message  spoolers  located  at  alternative  sites.  This  problem 
has  been  handled  in  a  variety  of  ways  in  the  literature. 
Lampson  and  Sturg i s [ 1 975 1  require  minimizing  the  likelihood 
of  a  failure  during  what  they  call  a  "reliable  broadcast 
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after  the  updates  are  committed  but  before  they  are  recorded 
for  recovery.  This  is  handled  by  requiring  that  the 
recovery  data  be  recorded  using  write  ahead  log  protocol,  as 
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explained  earlier. 

The  problem  is  more  serious  in  the  event  of  brief 
communication  failures  which  result  in  isolated  subclusters 
of  computers.  Each  separated  segment  continues  its 
operation,  without  any  chance  for  the  isolated  pieces  to 
coordinate  their  activities.  Consequently,  on  restoration 
of  communication  the  fragments  are  inconsistent. 

Transactions  run  in  a  partitioned  network  are  simply 
incorrect  in  the  context  of  the  total  network.  Thus, 
interleaving  their  actions  to  maintain  consistency  is  a 
futile  effort.  The  correct  action  depends  on  the  database 
semantics,  topology  of  the  partition,  and  the  actions  of  the 
transactions.  A  complete  technical  solution  without  human 
decision  making  does  not  seem  feasible. 

System  operation  with  missing  sites  (failed  sites)  is 
necessary  to  avoid  delaying  transactions  till  recovery 
occurs.  This  can  be  accomplished  by  recording  update 
messages  for  maintaining  mutual  consistency  in  message 
spoolers,  for  the  failed  sites.  Lampson  and  Sturgis  use  the 
concept  of  "intentions  list"  (a  non-volatile  storage  where 
all  updates  are  recorded) ,  and  the  fact  that  set  locks  leave 
a  lingering  recollection  of  a  failed  site  until  it  recovers. 

4.10  Discussion 

Not  every  issue  involved  in  system  recovery  is 
discussed  rigorously  in  this  chapter.  In  particular, 
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transactions  have  been  subdivided  into  classes  depending  on 

their  recovery  requirements.  Appropriate  system  recovery 

« 

protocols  for  transaction  classes  have  been  proposed.  We 
have  stressed  the  importance  of  the  availability  of  the  on¬ 
line  system  at  all  times,  and  advocate  that  the  recovery 
overhead  should  be  directly  dependent  on  the  value  or 
sensitivity  of  the  data  accessed.  To  accomplish  this  we 
have  used  our  knowl edge  of  the  nature  of  all  the 
transactions  that  access  the  data.  Presumably  the  knowledge 
available  about  the  concurrent  processes  can  be  used  in 
designing  optimal  algorithms  for  several  other  issues 
involved  in  the  design  of  distributed  databases.  It  is 
necessary  to  conduct  experiments  and  subsequent  analysis  of 
simulated  or  distributed  databases,  to  provide  better 
assessment  of  the  techniques.  Performance  measurements 
should  be  the  next  big  step  in  distributed  database 
research . 
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CHAPTER  5 


CONCLUDING  REMARKS 

5.1  Summary  of  Results  Obtained 

5.1.1  Deadlocks 
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granted.  An  on-line  detection  algorithm  is  suggested  and 
developed.  All  of  the  earlier  algorithms  restrict  a  process 
to  having  at  most  one  outstanding  request.  In  our  approach, 
such  a  restriction  is  removed  in  view  of  the  fact  that  in 
real-world  applications  more  than  one  outstanding  request  is 
a  certainty.  This  leads  to  a  situation  in  which  an 
allocation  decision  on  a  data  resource  (with  multiple 
waiting-access  requests)  released  by  a  completing  process 
would  lead  to  a  deadlock.  For  this  case,  an  elegant 
solution  which  combines  the  principles  of  detection  and 
avoidance  is  shown  where  a  potential  deadlock  is  detected 
and  avoided.  On  account  of  the  fact  that  requests  in 
databases  are  data-driven  and  content-based,  the  possibility 
of  multiple  outstanding  requests  is  high.  Thus,  the  results 
and  the  mixed  solution  in  this  case  are  new  and  original. 
Besides  its  low  level  of  communication  activity  the  approach 
has  several  other  major  advantages  as  outlined  in  Section 
3.9. 

5.1.2  System  Recovery 

Another  aspect  of  the  problem  considered  is  the 
reliable  operation  of  database  systems,  partitioned  and/or 
replicated  over  a  network  of  computers.  The  design  of  a 
method  which  maintains  database  consistency  during  system 
update  and  recovery  is  guided  by  the  goals  of  s impl ic i ty , 
tolerable  overhead ,  partial  operabil i ty ,  and  avoidance  of 
global  rollback .  In  this  new  approach,  retrieval  and  update 
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defined  which  take  advantage  of  the  known  properties  of  each 
transaction  class.  An  optimal  policy  for  checkpointing  in  a 
particular  recovery  protocol  is  derived  using  a  simple 
model.  The  policy  determines  the  checkpoint  dynamically  as 
the  transactions  are  processed,  and  is  different  from 
earlier  fixed  interval  approaches.  All  the  parameters 
involved  can  be  pr e-determ ined  due  to  the  transaction 
classification  facility,  database  usage  statistics  and  the 
expected  behaviour  of  certain  transactions.  Consequently, 
the  approach  is  new  and  practical.  The  cascading  effect  of 
a  global  rollback  is  modeled  by  using  the  progress  of 
processes  represented  by  a  set  of  interaction  tuples  and 
recovery  points  ordered  in  the  time  domain.  A  backup 
algorithm  based  on  this  model  is  developed.  Recovery 
aspects  for  a  wide  set  of  system  failures  are  considered  and 
several  partial  solutions  are  outlined. 

5.2  Significance  and  Motivation 

With  the  growing  use  of  terminal  oriented  computer 
systems,  and  the  increasing  trend  by  commercial  firms  for 
real-time  operations,  especially  those  involving  databases, 
the  database-oriented  operating  systems  are  faced  with  heavy 
demands.  A  characteristic  of  such  contemporary  systems  is 
their  high  degree  of  resource  and  data  sharing. 

Consequently,  the  possibility  of  deadlocks  increases.  Also, 
the  problem  of  system  recovery  from  crashes  is  intensified. 


In  our  view,  deadlock  prevention  schemes  are  not 
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justifiable  for  use  in  distributed  databases.  Extensive 
coordination  and  substantial  communication  is  necessary 
before  process  initiation,  for  processes  that  cannot  be 
shown  to  be  nonconflicting.  This  affects  system  performance 
by  lowering  the  degree  of  concurrency.  The  past  use  of 
prevention  principles  was  acceptable  because  of  low  levels 
of  concurrency  in  systems  rather  than  any  inherent 
superiority. 

We  advocate  the  use  of  deadlock  detection  in 
distributed  systems  [Isloor  and  Marsland,  1978;  Marsland  and 
Isloor,  1 9 ^ 8 1  .  Our  views  are  also  supported  in  recent 
literature.  When  dealing  with  concurrent  database  accesses, 
little  is  known  about  the  probability  of  interference  or 
deadlock.  For  transaction  processing  systems,  it  is  firmly 
believed  that  interference  is  rare  and  that  elaborate 
avo idance  o r  prevention  mechanisms  would  not  be  economical 
[Peebles  and  Manning,  19781.  Further,  to  quote  TLe  Lann, 
1978]  again,  "our  conclusion  will  be  that  for  systems  which 
include  a  partitioned  database  and  which  provide  for  storage 
of  pending  requests,  maintenance  of  internal  integrity  boils 
down  to  a  problem  of  deadlock  avoidance  or  detection  with 
distributed  control".  As  a  consequence,  and  in  view  of  the 
present  day  trend  towards  increased  concurrent  access  in 
systems,  the  on-line  detection  technique  is  a  step  toward 
increasing  user  confidence  in  distributed  systems.  The 
existing  algorithms  for  deadlock  detection  in  distributed 
databases  cannot  be  used  for  on-line  detection  because: 
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(i)  for  every  request  granted  or  not,  these  algorithms  need 
to  obtain  the  global  network  status  by  simultaneous 
communication  of  the  status  of  each  installation,  which 
leads  to  a  tremendous  amount  of  communication  in  the 
case  of  on-line  detection; 

(ii)  such  huge  communication  traffic  results  in 
synchronization  problems  due  to  communication  delays  in 
which  either  a  deadlock  is  indicated  where  one  no  longer 
exists,  or  an  existing  deadlock  goes  undetected;  and 

(iii)  after  obtaining  the  complete  network  status,  the 
algorithms  have  to  perform  computations  to  detect  a 
dead lock . 

In  a  distributed  environment,  a  longer  delay  in  the 
detection  of  deadlocks  can  have  disastrous  effects  on  the 
consistency  of  the  database.  By  detecting  on-line,  closer 
to  the  source  and  instant  of  occurrence,  the  opportunity  for 
timely  corrective  action  is  greatly  enhanced. 

The  basic  strategy  expounded  in  the  design  of  recovery 
protocols  consists  of  discriminating  between  update  and 
retrieval  transactions,  and  sub-classifying  them  in  such  a 
way  that  recovery  protocols  can  be  developed  for  each  type 
of  transaction.  It  is  believed  that  for  transaction 
processing  systems  over  95-99%  of  transactions  fall  into  the 
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It  is  difficult  to  estimate  the  performance  effects  of 
deadlock  handling  techniques  or  the  probability  of 
occurrence  of  deadlocks  in  distributed  databases,  since 
communication  time  is  a  critical  factor.  Because  of  the 
increased  complexity  of  distributed  databases  a  significant 
factor  in  handling  deadlocks  is  the  operational  efficiency. 
It  is  probably  necessary  for  distributed  databases  to  become 
a  commercial  reality,  so  that  experimental  data  can  be 
gathered  to  measure  performances  and  probabilities,  before 
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the  effect  of  communication  aspects  can  be  estimated.  To 
estimate  the  performance  of  the  on-line  detection  scheme, 
future  researchers  should  look  for  appropriate  simulation, 
testing,  and  interpretation. 

The  probabilistic  model  of  deadlocks  [Ellis,  1974] 
considered  no  more  than  5  resources  and  processes,  which  is 
trivial  in  the  commercial  world.  More  research  is  clearly 
needed  in  that  area.  A  comprehensive  probabilistic  model 
for  computer  deadlocks  of  large  systems  is  missing  from  the 
literature.  More  research  is  warranted  to  include 
extensions  of  any  such  comprehensive  models  to  systems  with 
consumable  resources. 

A  comprehensive  combined  approach  to  deadlocks  in 
database  systems  or  distributed  databases  is  probably  a 
welcome  step.  Deadlock  prevention  techniques  are  advisable 
for  processes  accessing  highly  secure  data  in  a  concurrent 
system,  whereas  detection  and  avoidance  principles  can  be 
used  for  less  important  processes.  The  literature  has  not 
yet  included  solutions  to  deadlock  problems  involving 
processes  accessing  highly  classified  data  in  an  integrated 
database.  Research  is  necessary  to  determine  an  efficient 
and  effective  method  of  rolling  back  a  process.  Such  a 
mechanism  can  make  deadlock  detection  techniques  much  more 
attractive. 

As  more  and  more  data  is  integrated  over  a  network  of 
computers,  resulting  in  the  databases  becoming  more 
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accessible  to  larger  number  of  diverse  application  jobs,  the 
complexity  of  the  function  of  the  database  administrator 
(DBA)  increases  enormously.  The  actions  of  the  operating 
system  (which  manages  application  jobs)  and  that  of  the  DBA 
(who  maintains  process  integrity  and  the  consistency  of  the 
database),  have  to  be  coordinated.  It  is  necessary  that 
both  the  DBA  and  the  operating  system  have  a  thorough 
understanding  of  the  relationships  among  concurrency 
controls,  processors,  processes,  deadlock  handling  and 
recovery  techniques,  communication  aspects  and  protocols. 

The  relevance  of  such  coordination  may  necessitate  having  a 
higher  authority  over  both  the  DBA  and  the  operating 
systems.  This  area  calls  for  deeper  study  especially  in  a 
network  environment. 

Formal  development  and  analysis  of  the  recovery 
protocols  proposed  is  necessary.  Such  analysis  should 
presumably  use  the  knowledge  available  about  the  concurrent 
processes.  Performance  measurements  following  the  analysis, 
on  simulated  or  implemented  systems  should  be  the  next  big 
step  in  distributed  database  research. 

The  detection  of  errors  and  failures,  and  their 
categorization  offer  open  areas  for  research.  The 
opportunity  for  timely  correction  of  errors  is  greatly 
enhanced  if  they  are  detected  closer  to  the  source  of  their 
occurrence.  Propagation  of  errors  is  a  major  problem  and  is 
severe  in  a  network  environment.  This  calls  for  immediate 
detection  at  the  source  to  maintain  a  high  degree  of 
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performance.  The  extents  of  applicability  of  hardware  and 
software  recoverability  techniques  so  as  to  speed  up  system 
recovery  needs  further  research. 
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APPENDIX  A 


AN  EXAMPLE  OF  "CYCLIC  RESTART" 


Certain  prevention  methods  require  a  blocked  process  to 
release  a  resource  held,  when  requested  by  an  active 
process.  Also  an  active  process  gets  blocked  when  it 
requests  a  resource  held  by  another  active  process.  For 
some  peculiar  situations  in  database  systems,  this  scheme  is 
subject  to  "cyclic  restart"  in  which  two  or  more  processes 
get  entangled  perpetually  by  continually  blocking,  aborting, 
and  restarting  each  other.  We  demonstrate  this  phenomenon 
with  an  example. 


^A  wa 
P  •-+-—+£ - +- 

p^  starts. 


w  ri  r„  w.  r^  w„  r: 

__+D - VvW»-+---+- - +5__+D - ?yVW 


rn  w„ 
B  B 
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ZP2  starts. 


P  waits.  P  restarts. 
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P_  waits.  P0  restarts* 
2  2  : 


t  ime+- 
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CONFIGURATION 
REPEATS. 
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Consider  two  processes  P^  and  P^  starting  at  time  t  . 
Processes  P^  and  P^  read  and  write  entities  {A,D}  and  B 
respectively.  At  time  t1  process  P  requests  access 

to  entity  B  and  gets  blocked,  whereas  shortly  afterwards  P^ 
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requests  access  to  D  resulting  in  the  abortion  and 
restarting  of  and  so  on.  The  situation  at  the  time  t^ 
identical  to  that  at  the  time  t^.  This  cycle  will  try  to 
sel f-sychronize  itself  and  will  tend  to  repeat  this  loop 
indefinitely  notwithstanding  minor  system  environment 
variations  in  timing. 
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APPENDIX  B 


AN  EXAMPLE  FOR  MULTIPLE  OUTSTANDING  REQUESTS 

To  show  the  possibility  of  more  than  one  outstanding 
request  per  process,  we  choose  "Presidential  Data  Base" 
[Chamberlin,  1976]  in  relational  model  of  data  TCodd,  19701 . 
The  relations  in  the  database  are  in  Third  Normal  Form 
[Codd,  1971a] .  A  query  to  the  database,  which  causes  two 
simultaneous  requests,  is  expressed  in  Data  Sub-Language 
ALPHA  [Codd ,  1971b]  . 

The  relations  in  the  database  and  the  query  are: 

R  :  ELECTIONS_WON  (YEAR,  WINNER_NAME ,  WINNER_VOTES ) 

R2:  PRESIDENTS  (NAME,  PARTY,  HOME_STATE) 

R^:  ELECTIONS_LOST  (YEAR,  LOSER_NAME,  LOSER_VOTES) 

R4:  LOSERS  (NAME,  PARTY) 

Query:  (a)  (In  English)  List  the  election  years  in 
which  a  Republican  from  Illinois  was  elected. 

The  intent  of  this  query  in  English  is:  Retrieve  the 
YEAR  attribute  from  any  tuple  of  relation  ELECTIONS_WON 
whose  WINNER_NAME  attribute  matches  the  NAME  attribute 
of  a  tuple  of  relation  PRESIDENTS  if  that  tuple  of 
relation  PRESIDENTS  has  a  HOME_STATE  of  ILLINOIS  and  a 
PARTY  equalling  REPUBLICAN. 

(b)  (In  DSL  ALPHA) 

RANGE  PRESIDENTS  P 
RANGE  ELECTIONS  WON  E 


GET  W  E . YEAR :  3  P  (P . NAME  =  E. WINNER  NAME  AND 


P. PARTY  =  'REPUBLICAN'  AND 
P.HOME_STATE  =  'ILLINOIS') 

For  the  execution  of  this  query  it  is  essential  to  gain 
access  to  the  relations  R^  and  R  at  the  same  time,  causi 
two  potential  outstanding  requests. 


